logoalt Hacker News

jcalvinowensyesterday at 9:24 PM4 repliesview on HN

I had to block meta's ASN on my personal cgit server a few weeks ago because they were ignoring robots.txt and torching it. Like hundreds of megabytes of access logs just from them, spread around different network blocks to clearly try and defeat IP based limiting. I couldn't believe it.


Replies

dawnerdtoday at 4:16 AM

I had to last year too, nonstop crawling, random urls that didn't exist. It looked like they were trying to proxy user queries through to a search endpoint too. The ASN matched so I know it wasn't someone spoofing them.

bfleschyesterday at 10:27 PM

IMO ASN-based blocking should be much more common, but unfortunately it is not supported as a first-class configuration option in many common tools.

show 3 replies
hsuduebc2today at 2:33 AM

Hey, how do you identify them? Is there a service to recognize which of these companies scrapped you?

websapyesterday at 10:52 PM

[flagged]

show 2 replies