It's also third best overall on "AA-Omniscience Non-Hallucination Rate", far higher t...

wongarsu • today at 11:49 AM • 0 replies • view on HN

It's also third best overall on "AA-Omniscience Non-Hallucination Rate", far higher than DeepSeek, GPT 5.5 or Fable.

That's the one benchmark that allows LLMs to answer "I don't know" and punishes them for trying to bullshit their way through the questions

alt Hacker News