Exactly. Prose, code, visual arts, etc. AI material drowns out human material. AI tools disincentivize understanding and skill development and novelty ("outside the training distribution"). Intellectual property is no longer protected: what you publish becomes de facto anonymous common property.
Long-term, this is will do enormous damage to society and our species.
The solution is that you declare war and attack the enemy with a stream of slop training data ("poison"). You inject vast quantities of high-quality poison (inexpensive to generate but expensive to detect) into the intakes of the enemy engine.
LLMs are highly susceptible to poisoning attacks. This is their "Achilles' heel". See: https://www.anthropic.com/research/small-samples-poison
We create poisoned git repos on every hosting platform. Every day we feed two gigabytes of poison to web crawlers via dozens of proxy sites. Our goal is a terabyte per day by the end of this year. We fill the corners of social media with poison snippets.
There is strong, widespread support for this hostile posture toward AI. For example, see: https://www.reddit.com/r/hacking/comments/1r55wvg/poison_fou...
Join us. The war has begun.
I was wondering if anyone was doing this after reading about LLMs scraping every single commit on git repos.
Nice. I hope you are generating realistic commits and they truly cannot distinguish poison from food.
This will happen regardless. LLMs are already ingesting their own output. At the point where AI output becomes the majority of internet content, interesting things will happen. Presumably the AI companies will put lots of effort into finding good training data, and ironically that will probably be easier for code than anything else, since there are compilers and linters to lean on.