logoalt Hacker News

madeofpalktoday at 12:15 PM8 repliesview on HN

Is there any evidence or hints that these actually work?

It seems pretty reasonable that any scraper would already have mitigations for things like this as a function of just being on the internet.


Replies

raincoletoday at 1:57 PM

It might work against people just use their Mini Mac with OpenClaw to summarize news every morning, but it certainly won't work against Google.

More centralized web ftw.

show 3 replies
sd9today at 12:18 PM

Even it did work, I just can't bring myself to care enough. It doesn't feel like anything I could do on my site would make any material difference. I'm tired.

show 1 reply
xyzaltoday at 3:33 PM

About two years ago, I made up reference to a nonexistent python library and put code "using" it in just 5 GitHub repos. Several months later the free ChatGPT picked it up. So IMO it works.

show 1 reply
bediger4000today at 2:46 PM

The search engine crawlers are sophisticated enough, but Meta's are not. Neither is Anthropic's Claude crawler. Source: personal experience trying garbage generators on Yandex, Blexbot, Meta's and Anthropics crawlers.

I'm completely uncertain that the unsophisticated garbage I generated makes any difference, much less "poisons" the LLMs. A fellow can dream, can't he?

spiderfarmertoday at 1:47 PM

There are hundreds of bots using residential proxies. That is not free. Make them pay.

m00dytoday at 1:33 PM

it won't work, especially on gemini. Googlebot is very experienced when it comes to crawling. It might work for OpenAI and others maybe.

nubgtoday at 12:37 PM

What kind of migitations? How would you detect the poison fountain?

show 2 replies
phoronixrlytoday at 12:56 PM

It does work, on two levels:

1. Simple, cheap, easy-to-detect bots will scrape the poison, and feed links to expensive-to-run browser-based bots that you can't detect in any other way.

2. Once you see a browser visit a bullshit link, you insta-ban it, as you can now see that it is a bot because it has been poisoned with the bullshit data.

My personal preference is using iocaine for this purpose though, in order to protect the entire server as opposed to a single site.