I’m with the people pushing back on the “confidence scores” framing, but I think the deeper issue is that we’re still stuck in the wrong mental model.
It’s tempting to think of a language model as a shallow search engine that happens to output text, but that metaphor doesn’t actually match what’s happening under the hood. A model doesn’t “know” facts or measure uncertainty in a Bayesian sense. All it really does is traverse a high‑dimensional statistical manifold of language usage, trying to produce the most plausible continuation.
That’s why a confidence number that looks sensible can still be as made up as the underlying output, because both are just sequences of tokens tied to trained patterns, not anchored truth values. If you want truth, you want something that couples probability distributions to real world evidence sources and flags when it doesn’t have enough grounding to answer, ideally with explicit uncertainty, not hand‑waviness.
People talk about hallucination like it’s a bug that can be patched at the surface level. I think it’s actually a feature of the architecture we’re using: generating plausible continuations by design. You have to change the shape of the model or augment it with tooling that directly references verified knowledge sources before you get reliability that matters.
>A model doesn’t “know” facts or measure uncertainty in a Bayesian sense. All it really does is traverse a high‑dimensional statistical manifold of language usage, trying to produce the most plausible continuation.
And is that that different than what we do under the scenes? Is there a difference between an actual fact vs some false information stored in our brain? Or both have the same representation in some kind of high‑dimensional statistical manifold in our brains, and we also "try to produce the most plausible continuation" using them?
There might be one major difference is at a different level: what we're fed (read, see, hear, etc) we also evaluate before storing. Does LLM training do that, beyond some kind of manually assigned crude "confidence tiers" applied to input material during training (e.g. trust Wikipedia more than Reddit threads)?
Hallucinations are a feature of reality that LLMs have inherited.
It’s amazing that experts like yourself who have a good grasp of the manifold MoE configuration don’t get that.
LLMs much like humans weight high dimensionality across the entire model then manifold then string together an attentive answer best weighted.
Just like your doctor occasionally giving you wrong advice too quickly so does this sometimes either get confused by lighting up too much of the manifold or having insufficient expertise.
A different way to look at it is language models do know things, but the contents of their own knowledge is not one of those things.
It's not even a manifold https://arxiv.org/abs/2504.01002
You have a subtle slight of hand.
You use the word “plausible” instead of “correct.”
I mean... That is exactly how our memory works. So in a sense, the factually incorrect information coming from LLM is as reliable as someone telling you things from memory.
Solid agree. Hallucination for me IS the LLM use case. What I am looking for are ideas that may or may not be true that I have not considered and then I go try to find out which I can use and why.