False vs misleading doesn't seem like a disagreement?

singpolyma3 • today at 12:55 PM • 2 replies • view on HN

Replies

According to the benchmark it is. "Only one verdict bucket can be correct per claim, so any disagreement among the panel means at least one model's verdict is label-inconsistent under this 4-bucket rubric (True / Mostly True / Misleading / False)"

➕ show 1 reply

kostaj • today at 1:09 PM

Yes, they are much closer verdicts. True and Mostly True are also close. Used Krippendorff's α (ordinal) to not penalize much closer disagreements. 21% of the claims have models that are on the polar opposite sides - at least one True, and at least one False.

➕ show 1 reply

alt Hacker News

Replies