The core mechanic described here is real. RLHF does optimize toward the mean, that is just what happ...

Arifcodes • yesterday at 9:01 PM • 1 reply • view on HN

The core mechanic described here is real. RLHF does optimize toward the mean, that is just what happens when you train on human preference ratings and raters consistently reward clear, inoffensive, "polished" output.

But the damage is not uniform. For code comments, API docs, commit messages: low-entropy output is often fine. The problem is people using LLMs for things that require a distinct voice and then wondering why the result sounds like everyone else on the internet.

The part nobody talks about: you can partially fight this if you know what you lost. Prompts like "preserve unusual word choices" or "do not normalize my rhetorical structure" help, but only if you have a strong enough baseline to catch the drift. Most people using AI for writing assistance do not have that baseline, which is why the ablation goes undetected. They see polished output and ship it.

Replies

JamesBarney • yesterday at 9:11 PM

The vast majority of people who write don't have a voice worth preserving. The rest can build out a voice document to make sure the AI doesn't strip it out.

alt Hacker News

Replies