> The non-hallucination rate in AA-omniscience is SOTA
Note that a perfect "non-hallucination rate" is rather meaningless as such tests can contain human hallucinations.
It means the model aligns with the possibly-true, possibly-false beliefs of the group that made the test.
Here are some examples of the questions in the benchmark. If these are representative, they seem pretty cut and dry. https://artificialanalysis.ai/evaluations/omniscience#exampl...
Was there something about this specific model and submission that made you feel compelled to write this self-evident observation?
Or would you describe your methodology as more like picking a random sentence fragment as an input value then generating completions from your existing corpus without any post-input "learning" process related to the rest of the source material?
[dead]
Well, yes, garbage in garbage out. That's a given and not what's meant by "hallucination" in this context.