logoalt Hacker News

mrandishtoday at 1:07 AM1 replyview on HN

> when the model won't actually be able to provide one

This is key. In my experience, asking an LLM why it did something is usually pointless. In a subsequent round, it generally can't meaningfully introspect on its prior internal state, so it's just referring to the session transcript and extrapolating a plausible sounding answer based on its training data of how LLMs typically work.

That doesn't necessarily mean the reply is wrong because, as usual, a statistically plausible sounding answer sometimes also happens to be correct, but it has no fundamental truth value. I've gotten equally plausible answers just pasting the same session transcript into another LLM and asking why it did that.


Replies

vanviegentoday at 6:51 AM

> In a subsequent round, it generally can't meaningfully introspect on its prior internal state

It can't do any better in the moment it's making the choices. Introspection mostly amounts to back-rationalisation, just like in humans. Though for humans, doing so may help learning to make better future decisions in similar situations.

show 1 reply