logoalt Hacker News

joecool1029today at 2:29 AM1 replyview on HN

No, archive.org does NOT respect robots.txt. You need to reach out to them directly and ask your site not be included: https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...


Replies

input_shtoday at 5:46 AM

Aren't you choosing to ignore something very specific specified in that article? Why do you make it seem that article implies it's their overall policy?

> A few months ago we stopped referring to robots.txt files on U.S. government and military web sites for both crawling and displaying web pages (though we respond to removal requests sent to [email protected]).

show 1 reply