I just don't understand why bot owners can't just run a complete windows 11 VM running Google Chrome complete with graphics acceleration.
You can probably run 50 of those simultaneously if you use memory page deduplication, and with a decent CPU+GPU you ought to be able to render 50 pages a second. That's 1 cent per thousand page loads on AWS. Damn cheap.
They do, but the fact that they have to do this means there are fewer bots because it's less economical to go to such lengths, compared to something much less complex (which is orders of magnitude cheaper).
there are scraping subreddits.
if you browse them you will see that bot writers are very annoyed if they can't scrape a site with a headless browser.
you can do what you suggested, but with Linux VMs/containers. windows is too heavy, each VM will cost you 4 GB of RAM
284 on 296gb of ram with deduplication enabled on a 128c with 32Q vgpu.
I am reasonably sure that these kind of fingerprints can detect if the browser is inside a VM.
… yup?
I mean you missed the minigame of preventing Chrome from signaling that it’s being programmatically (webdriver etc) driven and tipping your hand, but … yup?
If you know of a simple way to run a Windows 11 VM with good graphics acceleration (no GPU passthrough), please contact me.
In theory you could run hundreds of full-fat Chrome bots if you don't care about the ops mess, but keeping Windows images stable while Cloudflare and friends keep changing the fingerprinting game turns the cheap math into a maintenance job from hell. AWS VM signals are a big red flag, so you still eat CAPTCHAs and blocks even with a full browser stack. The page load number looks cheap.
There are myriad providers competing to offer this, nicely packaged with all the accoutrements (IP rotation, location spoofing, language settings, prebuilt parsers, etc.) behind an easy to use API.
Honestly it is a very healthy competitive market with reasonably low switching costs which drives prices down. These circumstances make rolling your own a tough sell.