no? it takes 10 seconds to check: > The /crawl endpoint respects the directives of robots....

theamk • yesterday at 11:56 PM • 2 replies • view on HN

no? it takes 10 seconds to check:

> The /crawl endpoint respects the directives of robots.txt files, including crawl-delay. All URLs that /crawl is directed not to crawl are listed in the response with "status": "disallowed".

You don't need any scraping countermeasures for crawlers like those.

Replies

Macha • today at 1:10 AM

So what’s the user agent for their bot? They don’t seem to specify the default in the docs and it looks like it’s user configurable. So yet another opt out bot which you need your web server to match on special behaviour to block

➕ show 2 replies

PeterStuer • today at 7:50 AM

Like they explain in the docs, their crawler will respect the robots.txt dissalowed user-agents, right after the section hat explains how to change your user-agent.

alt Hacker News

Replies