It took me a while to figure out how to interpret the benchmark correctly, because on the overview p...

SilverServer • today at 2:00 PM • 0 replies • view on HN

It took me a while to figure out how to interpret the benchmark correctly, because on the overview page it says "AA-Omniscience Non-Hallucination Rate," but on the benchmark page https://artificialanalysis.ai/evaluations/omniscience#aa-omn...

it said "the lower, the better." Eventually, I realized that the "non" reverses the scores. And indeed, the results are consistent.

alt Hacker News