> It is certainly seems possible that the actual sources of the data is the output of some other LLM.
My guess is you can see this happening with the bots on Reddit where they are refining the answers to one certain thing, often getting two or three the same responses in a row from different users because they have been enforcing themselves by digesting the output of other bots. Waiting to see when they cut down the sentences and start talking garbage.