IMO the under-discussed risk here is that sites will start serving different content to verified cra...

shubhamintech • today at 6:24 PM • 2 replies • view on HN

IMO the under-discussed risk here is that sites will start serving different content to verified crawlers vs real users. You're already seeing it with known search bots getting sanitized views. If your agent's context comes from a crawl the site knows is going to an AI, you have no guarantee it matches what a human sees, and that data quality problem won't surface until your agent starts acting on selectively curated information.

This could go wrong on same levels.

Replies

devnotes77 • today at 7:03 PM

This already has a name in SEO circles: "cloaking" - serving different content to crawlers than to humans. Google has penalized sites for this for years.

The ironic twist is that sites might actually want to serve curated content to AI crawlers - not to deceive, but to control their representation in AI systems. The incentive structure is already there.

The precedent with Google AMP cached content showed what happens: publishers optimized specifically for what the crawler saw, diverging from the actual user experience. The "real" web became secondary.

The real danger isn't fraud - it's gradual optimization pressure. If AI agents increasingly act on crawled data, site operators have every incentive to polish what the crawler sees. You end up with an AI-facing web that diverges from what humans actually experience.

vimda • today at 6:35 PM

This already happens in the opposite direction. See: news websites that drop their pay wall for GoogleBot

alt Hacker News

Replies