The non-hallucination rate in AA-omniscience is SOTA, better than Opus 4.7, Gemini 3.1 Pro and GPT5....

goldenarm • yesterday at 1:43 PM • 5 replies • view on HN

The non-hallucination rate in AA-omniscience is SOTA, better than Opus 4.7, Gemini 3.1 Pro and GPT5.5! Congrats to the team

Replies

girvo • yesterday at 10:51 PM

The big question for me having used a lot of these SOTA chinese models is: what is its token efficiency like?

Running Step 3.5 Flash locally for example, it's an amazingly capable model all things considered, but it's token efficiency is so bad that it gets out performed by most others wall-clock time (even with my MTP-support for it hacked in to llama.cpp: despite being trained on three heads, MTP 2 is the sweet spot, and only gets it from 20tk/s to 30tk/s on my Spark)

The DeepSeek models and Qwen 3.5 Plus are also good examples of this: compared to Opus, and especially GPT 5.5 they use many more tokens to get to the same answers.

I'm really hoping that Qwen 3.7 is better in this regard, can't wait to try it out

(ps. running DeepSeek v4 Flash on my Spark is absolutely wild, thanks antirez if you see this haha)

➕ show 1 reply

throawayonthe • yesterday at 2:21 PM

referencing this:

https://artificialanalysis.ai/evaluations/omniscience?models...

(had to add it to the chart, wasn't displayed by default. is it the lowest rate in the datasetor no?)

gslepak • yesterday at 3:34 PM

> The non-hallucination rate in AA-omniscience is SOTA

Note that a perfect "non-hallucination rate" is rather meaningless as such tests can contain human hallucinations.

It means the model aligns with the possibly-true, possibly-false beliefs of the group that made the test.

➕ show 1 reply

sheepscreek • yesterday at 3:01 PM

Truly incredible! Very impressed by their progress. I wonder how much of their own chips did they use for training.

baq • yesterday at 3:23 PM

wonder at which level there's a capability state transition? 5%? 1%?

alt Hacker News

Replies