This counts only incorrect answers though. A model can get 0% hallucination rate just by refusing to...

jampekka • yesterday at 7:27 PM • 4 replies • view on HN

This counts only incorrect answers though. A model can get 0% hallucination rate just by refusing to answer all questions.

Replies

ffsm8 • yesterday at 8:21 PM

Isn't that precisely the reason why we introduced the term hallucination? Because llms have historically always made up bullshit of they cannot answer directly... If they now nailed this to maybe the model not respond instead of responding incorrectly, then a lot of previously unusable usecases would become feasible.

So I feel like that's exactly the right metric and the way to track it wrt hallucinations.

➕ show 1 reply

jug • yesterday at 10:36 PM

I think that's what the Omniscience Index is for:

https://artificialanalysis.ai/evaluations/omniscience#aa-omn...

It rewards correct answers and penalizes hallucinations, and finally no reward for refusing to answer.

It's interesting just how poorly some popular Chinese models fare in this regard, like GLM 5.1 or DeepSeek 4 Pro.

Gemini 3.x has truly remarkable knowledge given how it leads in this benchmark despite being (quite a bit) more prone to hallucinate than Claude Opus.

aicantdeny • today at 1:12 AM

> by refusing to answer all questions.

Cool, precisely the thing other AI is too stupid to do when they don't have the necessary knowledge.

speed_spread • yesterday at 8:31 PM

Yes. A model that can answer "I don't know" would be much more trustable than the current used car salesman we have now.

➕ show 1 reply

alt Hacker News

Replies