Does mass scraping need google for content discovery? Surely most sites contain a site map or index ...

hsbauauvhabzb • today at 6:57 AM • 1 reply • view on HN

Does mass scraping need google for content discovery? Surely most sites contain a site map or index that would effectively self enumerate once you know the domain, which is more often than not publicly disclosed?

Replies

rvz • today at 11:27 AM

What matters is when websites put this new version of reCAPTCHA on their site, just like archive.is has done. Then the scrapers will have a hard time getting around that.

alt Hacker News

Replies