As measured by #_no_answer/(#_incorrect + #_no_answer) the top current models can do it 60-70% of the time (Grok 4.20 is the best with 83%): https://artificialanalysis.ai/evaluations/omniscience