The title got me, I'll admit it—except that the benchmark is a game where the models are told to lie.
I find it deeply funny and I suppose a bit expected that a Grok model appears at face value to be optimized for supposed truth telling.
And to keep the e-mob off my back, I don't endorse Elon Musk.
The title got me, I'll admit it—except that the benchmark is a game where the models are told to lie.