Reddit sold it's data to AI companies for training[1]. They could have refused, but companies ...

beloch • yesterday at 11:19 PM • 0 replies • view on HN

Reddit sold it's data to AI companies for training[1]. They could have refused, but companies like OpenAI likely would have harvested that data anyways. As such, it should not be surprising that AI models are pretty good at generating reddit posts. They were specifically trained to do that.

This is sad, because Reddit remained one of the final bastions of human content on the internet. For several years, appending "site:reddit.com" to a google search was a valid way to get something usable out of a google search. Doing that is still an improvement over raw-dogging Google's ranking algorithms with an unfettered search, but AI slop increasingly is the result.

This is one of my great disappointments in the current rise of AI. LLM's can give good search results when dealing with a topic they've been specifically trained on by human experts, but they're not good at separating human-produced signal from AI slop noise. We've done nothing to prevent a sea of AI slop from being dumped on top all the human signal that's out there. When AI companies enter their enshittification phase and stop investing in expert human trainers, the search results LLM's produce are going to fall off a cliff. Search is a bigger problem than ever.

_____

[1]https://9to5mac.com/2024/02/19/reddit-user-content-being-sol...

alt Hacker News