logoalt Hacker News

neversupervisedtoday at 2:23 PM1 replyview on HN

This is not how people use LLMs. If you ask one of these questions you’d get a longer answer, often grounded on the internet. I speculate that conditional on a smart human operator interpreting the results, such interpretations across vendors converge more often than this report makes it seem.


Replies

tracker1today at 3:38 PM

Even then, there can often be substantive disagreements based on context. Hence the need for even a mostly true or mostly false bucket.