"say that AI developers should incorporate more real-world diversity into large language model (LLM) training sets,"
Are you kidding me?
How much more "real-world diversity" could they possibly incorporate into the models than the entire freaking Internet and also every scrap of text written on paper the AI companies could get a hold of?
How on Earth could someone think that AIs speak like this because their training set is full of LLM-speak? This is transparently obviously false.
This is the sort of massive, blinding error that calls everything else written in the article into question. Whatever their mental model of AI is it has no resemblance to reality.
The problem isn't the diversity in the training set - the problem is that the method by design picks the average.