logoalt Hacker News

pjdesnotoday at 2:47 PM0 repliesview on HN

They overstate their results in the headline.

In section 2, 34% of cases are found to have "substantive" disagreements differing by 2 or more buckets - True + Misleading, Mostly True + False, or True + False.

This is probably a better measure than the headline one. It's still a concerning fraction, although some fraction is no doubt due to forcing "I don't know" cases to return an answer anyway.