Common misconception. As far we know, LLMs are not calibrated, i.e. their output "probabilities...

D-Machine • today at 5:12 AM • 2 replies • view on HN

Common misconception. As far we know, LLMs are not calibrated, i.e. their output "probabilities" are not in fact necessarily correlated with the actual error rates, so you can't use e.g. the softmax values to estimate confidence. It is why it is more accurate to talk about e.g. the model "logits", "softmax values", "simplex mapping", "pseudo-probabilities", or even more agnostically, just "output scores", unless you actually have strong evidence of calibration.

To get calibrated probabilities, you actually need to use calibration techniques, and it is extremely unclear if any frontier models are doing this (or even how calibration can be done effectively in fancy chain-of-thought + MoE models, and/or how to do this in RLVR and RLHF based training regimes). I suppose if you get into things like conformal prediction, you could ensure some calibration, but this is likely too computationally expensive and/or has other undesirable side-effects.

EDIT: Oh and also there are anomaly detection approaches, which attempt to identify when we are in outlier space based on various (e.g. distance) metrics based on the embeddings, but even getting actual probabilities here is tricky. This is why it is so hard to get models to say they "don't know" with any kind of statistical certainty, because that information isn't generally actually "there" in the model, in any clean sense.

Replies

adastra22 • today at 5:22 AM

I don't know if we are talking past each other, but I don't think this conversation is about absolute probabilities? The question is about relative uncertainty, and the softmax values are just fine for that.

It is too computationally expensive, which is why nobody does this for production inference. But there are alignment tools to extract out these latent-space probabilities for researchers in the frontier labs.

➕ show 1 reply

plaguuuuuu • today at 8:23 AM

I don't think it's that hard to get them to say "I don't know"

I'm pretty sure they are actively trained to avoid it.

Besides, like, what would you do if you asked your $200/mo AI something and it blanked on you?

➕ show 2 replies

alt Hacker News

Replies