Oh my god did we inadvertently train AIs on idiotspeak.
There was nothing inadvertent about it. A decade of cultivating and harvesting millions of examples of this kind of pseudo-writing from underpaid internet piece-workers preceded LLMs.
Given that this specific style is the result of being reinforced over and over again via RLHF, "inadvertently" isn't really the word I'd use.
In-advert-ently?
> did we inadvertently train AIs on idiotspeak.
Nope! That is - training on lowest-common-denominator, low-signal high-noise "idiotspeak" was not at all inadvertent.
It's engaging and I doubt it happened by accident.
No, we actually trained it on standardised tests https://marcusolang.substack.com/p/im-kenyan-i-dont-write-li...
* Checks notes *
4chan
Call of Duty chat logs
Every public marketing site
SlashDot
UseNet
...
Verdict: Yes idiotspeak was part of the training set, but no, it was not inadvertent. There's a smattering of Shakespeare in there, at least.
It seems to be called the Rule of 3. see https://en.wikipedia.org/wiki/Rule_of_three_(writing)
Like Caesar's supposed "Veni Vidi Vici" saying, people seem to prefer and remember items when grouped in three.
I recall a public speaking film shown to my management science class starring John Cleese mentioning this rule of 3.