In the referenced benchmark GLM-5.2 (max) got 25% of all questions correct. GPT-5.5 (xhigh) got 57% ...

stcg • today at 3:34 PM • 1 reply • view on HN

In the referenced benchmark GLM-5.2 (max) got 25% of all questions correct. GPT-5.5 (xhigh) got 57% correct.

https://artificialanalysis.ai/evaluations/omniscience

I'd much rather have some answer that I can verify than no answer to verify.

I don't want a model that says "I don't know", because I will verify the answer anyway.

Replies

> I don't want a model that says "I don't know", because I will verify the answer anyway.

Few people actually review answers or code. Because they have been sold the myth that these models can do it all. The main problem is that LLMs dont have causal models, and as a result, their reasoning is a high probability word salad and not a logically sound argument. Particularly on tricky corner cases which it hasnt encountered. I would still agree with you that sometimes hallucinations are actually useful as it provides a strawman, and having even a hallucinated answer to spar with is better than a "dont know".

alt Hacker News

Replies