Robots.txt is great if you're trying to run an above board operation. Much easier than trying to guess how a webmaster wishes the crawler to behave, and then getting angry emails when you guess wrong.
It's not great. It used to be very common that robots.txt would Disallow *, Allow GoogleBot which just entrenches the search engine monopoly. In response to this other search engines just used the rules for GoogleBot instead of the rules for their own crawlers.
It's not great. It used to be very common that robots.txt would Disallow *, Allow GoogleBot which just entrenches the search engine monopoly. In response to this other search engines just used the rules for GoogleBot instead of the rules for their own crawlers.