logoalt Hacker News

goldenarmyesterday at 7:03 PM0 repliesview on HN

It's a gibberish input detection benchmark, and does not measure output hallucinations.