logoalt Hacker News

throawayontheyesterday at 6:32 PM1 replyview on HN

well there is https://artificialanalysis.ai/evaluations/omniscience


Replies

goldenarmyesterday at 7:03 PM

It's a gibberish input detection benchmark, and does not measure output hallucinations.