To be clear, this is about anthropomorphizing large language models, not the general category of &qu...

tadfisher • yesterday at 5:17 PM • 1 reply • view on HN

To be clear, this is about anthropomorphizing large language models, not the general category of "things". Also, we should be evaluating these constructs using well-defined and measurable criteria; evaluating "honesty" fails to achieve both goals.

Replies

derac • yesterday at 5:31 PM

I think Honesty can be evaluated. Does the model push back when it knows the user is wrong? How often does the model hallucinate data vs. say it doesn't know? Provide a prompt with contradictions or other issues and see if the model corrects you.

Here is an article by Anthropic that explains what they do and mean in more detail: https://alignment.anthropic.com/2025/honesty-elicitation/

alt Hacker News

Replies