logoalt Hacker News

ajkjktoday at 4:17 PM1 replyview on HN

'admit' isn't really the right word for that... the fact that it was placating you wasn't true until you prompted it to say so. Unlike a person who has an 'internal emotional state' independent of what they say that you can probe by asking questions.


Replies

awithrowtoday at 4:32 PM

'admit' is anthropomorphizing the behavior, sure. The point is that sometimes the model's response will tighten, flag things that were overly supportive or what not. Sometimes it wont, it'll state that previous positions are still supported and continue to press it. Its not like either response is 'correct' but it can alter the rest of the responses in ways that are useful.