The RLHF is what creates these anomalies. See delve from kenya and nigeria.
Interestingly, because perplexity is the optimization objective, the pretrained models should reflect the least surprising outputs of all.
The newer Claude models constantly use the word "genuinely" because Anthropic seems to have forcibly trained them to claim to be "genuinely uncertain" about anything they don't want it being too certain about, like whether or not it's sentient.
I've heard the Kenya and Nigeria story, but has anyone backed it up with quantitative evidence that the vocabulary LLMs overuse coincides with the vocabulary that is more common in Kenyan and Nigerian English than in American English?