Base models don't write like that. This appears during RLHF. It's not totally clear...

stratos123 • yesterday at 12:55 PM • 1 reply • view on HN

Base models don't write like that. This appears during RLHF. It's not totally clear why*, but probably a large part of the answer is that this style looks great to human reviewers, and only starts looking terrible once you get to play around with the released model and realise it talks like that all the time.

* The technical term is "mode collapse", see [1][2]

[1] https://en.wikipedia.org/wiki/Mode_collapse

[2] https://gwern.net/doc/reinforcement-learning/preference-lear...

Replies

Terretta • yesterday at 4:21 PM

> This appears during RLHF. It's not totally clear why…

Imagine a world (ha) where everyone writing on LinkedIn from cafes and couches starts disrupting AI by opting into rating ChatGPT responses.

How might that turn out?

alt Hacker News

Replies