logoalt Hacker News

stratos123yesterday at 12:55 PM1 replyview on HN

Base models don't write like that. This appears during RLHF. It's not totally clear why*, but probably a large part of the answer is that this style looks great to human reviewers, and only starts looking terrible once you get to play around with the released model and realise it talks like that all the time.

* The technical term is "mode collapse", see [1][2]

[1] https://en.wikipedia.org/wiki/Mode_collapse

[2] https://gwern.net/doc/reinforcement-learning/preference-lear...


Replies

Terrettayesterday at 4:21 PM

> This appears during RLHF. It's not totally clear why…

Imagine a world (ha) where everyone writing on LinkedIn from cafes and couches starts disrupting AI by opting into rating ChatGPT responses.

How might that turn out?