grok is 17%? And that's the lowest, most models are like 80%+? While hallucination is probabl...

dubcanada • yesterday at 8:42 PM • 3 replies • view on HN

grok is 17%? And that's the lowest, most models are like 80%+?

While hallucination is probably closer to 100% depending on the question. This benchmark makes no sense.

Replies

> While hallucination is probably closer to 100% depending on the question.

But the benchmark didn't ask those questions, and it seems grok is very well at saying it doesn't know the answer otherwise.

elAhmo • yesterday at 9:16 PM

No one serious uses grok.

➕ show 3 replies

MagicMoonlight • today at 10:40 AM

It makes sense. Grok is taught to answer the question, regardless of how explicit or extreme it is. These other models are taught to suppress any wrongthink. That's going to make it hard to answer things correctly. If you've been told to answer something incorrectly because it's wrong, then you'll have to make up an answer.

alt Hacker News

Replies