There is no separation. Incentive propagates through LLMs with approximately 0 resistance. If the input tells a story, the output tends to that story reinforced.
The code/PR generator is heavily incentivized to spin by RL on humans - as soon as that spin comes into contact with your narrative gen context, it's cooked. Any output that has actually seen the spin is tainted and starts spinning itself. And then there's also spin originating in the narrative gen... Hence, the examples read like straight advertisements, totally contaminated, shot through with messaging like:
- this is solid, very trustworthy
- you can trust that this is reliable logic with a sensible, comprehensible design
- the patterns are great and very professional and responsible
- etc
If the narrative reads like a glow up photoshoot for the PR, something has gone extremely wrong. This is not conducive to fairly reviewing it. It is presented as way better than it actually is. Even if there are no outright lies, the whole thing is a mischaracterization.
RL is a hell of a drug.
Anyway, this is the problem of AI output. It cannot be trusted that the impression it presents is the reality or even a best attempt at reality. You have to carefully assemble your own view of the real reality in parallel to w/e it gives you, which is a massive pain in the ass. And if you skip that, you just continually let defects/slop through.
Worst problem mucking things up is basically that RL insights that work on people also work on AI, because the AI is modelling human language patterns. Reviewing slop sucks because it's filled with (working) exploits against humans. And AI cannot help because it is immediately subverted. So I guess it requires finding a way to strip out the exploits without changing mechanical details. But hard, because it saturates 100% of output at many levels of abstraction including the mechanical details.