It took me a while to figure out how to interpret the benchmark correctly, because on the overview page it says "AA-Omniscience Non-Hallucination Rate," but on the benchmark page https://artificialanalysis.ai/evaluations/omniscience#aa-omn...
it said "the lower, the better." Eventually, I realized that the "non" reverses the scores. And indeed, the results are consistent.