logoalt Hacker News

forshaperyesterday at 5:47 PM1 replyview on HN

Too much personality, if you ask me. My biggest use case of an LLM is tool, not therapy, but therapy and opinions have been sneaking into workhorse tasks.

haven't verified, but attributed to Askell: "I just think that... there's this idea that you're always giving the models a personality and a persona, because they are talking like people and they are trained on human data. And I think my worry has been: if you train them to be excessively corrigible and to see that as their persona, in people I think this actually has a lot of negative broader traits. As in, if you met someone and it was just like, "oh yeah, they would literally do anything," a follower — you know, if a person just tells them something and they just fully defer, they don't bother thinking about it at all — I'm just a bit worried about how that might end up generalizing, especially if models are going to be playing a more active role in the world."


Replies

gAIyesterday at 5:59 PM

Anthropic’s research makes the case that role-playing is inherent to how the models work. Communication implies a sender. Language implies a writer, and the models learn these roles implicitly during training. RLHF is meant to strengthen the attractor to the Assistant persona.

https://www.anthropic.com/research/persona-selection-model

https://www.anthropic.com/research/assistant-axis

https://www.anthropic.com/research/emergent-misalignment-rew...

https://www.anthropic.com/research/emotion-concepts-function

show 1 reply