logoalt Hacker News

RamblingCTOtoday at 8:43 AM8 repliesview on HN

Doesn't work for pages protected by cloudflare in my experience. What a shame, they could've produced the problem and sold the solution.


Replies

paxystoday at 3:31 PM

That’s what they are doing. This is a textbook protection racket.

“Buy Cloudflare bot protection, otherwise it would be a shame if your site got scraped and ddos’d.”

Who is doing the scraping and ddosing? Cloudflare.

show 3 replies
tyingqtoday at 2:22 PM

That's too funny. If true, really looking forward to the Cloudflare response here. I'm unsure how you would spin that in a way that didn't seem self-serving.

show 1 reply
kentonvtoday at 6:21 PM

Cloudflare crawl respects robots.txt. It does not attempt to bypass any anti-crawling measures. If the site doesn't want to be crawled -- whether it uses Cloudflare or not -- this product will not help you crawl it.

Some sites actually want crawlers -- e.g. sites that are selling a product, documentation, etc. That's what this product is meant for.

https://x.com/CloudflareDev/status/2031745285517455615

(Disclosure: I work for Cloudflare but not on this product.)

GodelNumberingtoday at 1:04 PM

I imagine that would cause a backlash from the website owners trusting cloudflare to keep their content 'safe'

antonyhtoday at 3:01 PM

Wait. What?

Is this just a way to strong-arm non-cloudflarians into adopting their platform if you don't want your site crawled? It does sound like they are selling the solution to avoid their own content crawler.

chvidtoday at 9:03 AM

As long at it gets past Azure's bot protection ...

davidhariritoday at 1:46 PM

Came here to write this. I am getting much better results from Firecrawl (not affiliated with them, just a happy customer).

show 3 replies
ekropotintoday at 3:10 PM

Please tells me you are joking