logoalt Hacker News

wongarsutoday at 11:49 AM0 repliesview on HN

It's also third best overall on "AA-Omniscience Non-Hallucination Rate", far higher than DeepSeek, GPT 5.5 or Fable.

That's the one benchmark that allows LLMs to answer "I don't know" and punishes them for trying to bullshit their way through the questions