As measured by #_no_answer/(#_incorrect + #_no_answer) the top current models can do it 60-70% ...

in-silico • yesterday at 9:24 PM • 0 replies • view on HN

As measured by #_no_answer/(#_incorrect + #_no_answer) the top current models can do it 60-70% of the time (Grok 4.20 is the best with 83%): https://artificialanalysis.ai/evaluations/omniscience

alt Hacker News