logoalt Hacker News

onceonceoncetoday at 2:34 PM1 replyview on HN

Teams I work with use the abstain rate to flag what goes to a human. Disagreement between models is the same idea. Your 67% is what makes "two cheap models, escalate when they fight" actually work. Without abstain it mostly looks like noise.


Replies