logoalt Hacker News

tarsingeyesterday at 12:53 PM1 replyview on HN

To me it was already quite intuitive, we are not really managing the psychological state: at its core a LLM try to make the concatenation of your input + its generated output the more similar it can with what it has been trained on. I think it’s quite rare in the LLMs training set to have examples of well thought professional solution in a hackish and urgency context.


Replies

astrangeyesterday at 9:27 PM

No, that's how base model pretraining works. Claude's behavior is more based on its constitution and RLVR feedback, because that's the most recent thing that happened to it.