Nice article and explanations!
On a tangential note, I keep noticing "why x matters", "it's crucial here" that just remind me of Claude. Recently Claude has been gaslighting me in complex problems with such statements and seeing them on an article is low-key infuriating at this point. I can't trust Claude anymore on the most complex problems where it sometimes gets the answer right but completely misses the point and introduces huge complex blocks of code and logic with precisely "why it matters", "this is crucial here".
I've seen many posts on Reddit in this AI-induced 'psychosis' when people end up believing the words that get generated for them without applying sufficient critical thought.
This sycophancy is a serious problem and exploits a weakness in the human psyche (flattery) which may be easier for the RLHF to find reward in than genuinely correct responses.