alt
Hacker News
goldenarm
•
yesterday at 7:03 PM
•
0 replies
•
view on HN
It's a gibberish input detection benchmark, and does not measure output hallucinations.