logoalt Hacker News

namegulfyesterday at 9:36 PM3 repliesview on HN

Robots.txt is lame BTW, there is no way to enforce it. It is up to the bot to decide to crawl or not and most cases they don't care.

Cloudflare had a nice technic to address the bot problem (if you use their name servers). It'll respect and use the robots.txt while sending the remaining bots to a deep black hole.


Replies

input_shyesterday at 10:03 PM

Yes, we know, its purpose is to guide the bots, not forcibly block them.

That said, one of the biggest websites in the world not respecting it is definitely a noteworthy story. Hopefully another one of the biggest websites in the world (formerly known as Twitter) eventually respects it as well instead of not even disclosing itself via a user agent and pretending to be Safari running on iOS.

show 1 reply
marginalia_nuyesterday at 11:15 PM

Robots.txt is great if you're trying to run an above board operation. Much easier than trying to guess how a webmaster wishes the crawler to behave, and then getting angry emails when you guess wrong.

show 1 reply
llbbddyesterday at 10:37 PM

Yeah, robots.txt is a great herald example of the type of solution invented by people who don't understand incentives whatsoever.

show 1 reply