Is there any evidence or hints that these actually work?
It seems pretty reasonable that any scraper would already have mitigations for things like this as a function of just being on the internet.
Even it did work, I just can't bring myself to care enough. It doesn't feel like anything I could do on my site would make any material difference. I'm tired.
About two years ago, I made up reference to a nonexistent python library and put code "using" it in just 5 GitHub repos. Several months later the free ChatGPT picked it up. So IMO it works.
The search engine crawlers are sophisticated enough, but Meta's are not. Neither is Anthropic's Claude crawler. Source: personal experience trying garbage generators on Yandex, Blexbot, Meta's and Anthropics crawlers.
I'm completely uncertain that the unsophisticated garbage I generated makes any difference, much less "poisons" the LLMs. A fellow can dream, can't he?
There are hundreds of bots using residential proxies. That is not free. Make them pay.
it won't work, especially on gemini. Googlebot is very experienced when it comes to crawling. It might work for OpenAI and others maybe.
What kind of migitations? How would you detect the poison fountain?
It does work, on two levels:
1. Simple, cheap, easy-to-detect bots will scrape the poison, and feed links to expensive-to-run browser-based bots that you can't detect in any other way.
2. Once you see a browser visit a bullshit link, you insta-ban it, as you can now see that it is a bot because it has been poisoned with the bullshit data.
My personal preference is using iocaine for this purpose though, in order to protect the entire server as opposed to a single site.
It might work against people just use their Mini Mac with OpenClaw to summarize news every morning, but it certainly won't work against Google.
More centralized web ftw.