logoalt Hacker News

jawnstoday at 1:38 PM6 repliesview on HN

"Extraterrestrial life exists somewhere in the universe."

GPT-5.4: Misleading

Opus 4.7: Misleading

Gemini 3: FALSE

Gemini 3 (Retrieval): FALSE

Sonar Pro: FALSE

It's a weird fact claim, because the ground truth is "nobody knows for sure" and that's not one of the available options.


Replies

drtztoday at 2:37 PM

> It's a weird fact claim, because the ground truth is "nobody knows for sure" and that's not one of the available options.

It's even weirder to suggest that the disagreement is indicative of a problem. If you asked five very knowledgeable humans on this subject to select the correct answer on a multiple-choice questionnaire, they would almost certainly vary significantly more than these 5 LLMs.

Not to say that hallucination isn't a problem, but this is a lousy way to test it.

show 1 reply
wongarsutoday at 1:52 PM

Of the available options, "Misleading" is probably the best, since something that is most likely true but unproven is presented as fact

But "unknown or undecidable" should have been a category.

jugtoday at 2:47 PM

Looks like an ongoing theme and a very poor benchmark. Not at all the claims I expected.

Alifatisktoday at 1:55 PM

Isn't misleading the correct option here then?

show 5 replies
mock-possumtoday at 3:00 PM

I would think ‘false’ is the only correct answer a there’s no evidence to prove the claim, so the claim is safely assumed false.

Then again maybe that’s why I’m an atheist, not an agnostic?

show 2 replies
1718627440today at 3:02 PM

I would argue, FALSE is the correct answer, since this is not a fact, you can know for sure. The logical inverse is also FALSE.

show 1 reply