logoalt Hacker News

kostajtoday at 2:03 PM2 repliesview on HN

@john_strinlai @gcr, depends on the application. In many cases an "I don't know" answer is indeed better than a forced answer. But in many production systems, LLMs generate content/response anyway.

Although inheriting the messiness of the real-world, the majority of these claims are objective enough to be classifiable by human experts with access to research. Plan to human-label the 1,000 claims and publish a follow-up research. Will consider adding an "I don't know" bucket too, as well as a clear instructions about the meaning of each of the 4 buckets.


Replies

simonwtoday at 2:06 PM

If you're going to run this again I also recommend encouraging the model to provide its rationale and then having it return the true/false/misleading/mostly-true/abstain at the end of its response.

Models give much better answers when they can "think out loud" before answering, and storing that rationale will make it easier to understand why they picked different answers for ambiguous questions.

show 2 replies
oofbeytoday at 3:26 PM

In many cases “I don’t know” is the correct answer - for questions about events that happened after the training cut off, if it doesn’t have web search, that is undeniably the correct answer. You’re forcing it to guess unnaturally. That really feels like you’re trying to prove a point (that your service can’t be replaced by AI) instead of actually performing research into how AI can be helpfully applied to this topic.