logoalt Hacker News

aembletontoday at 1:45 PM1 replyview on HN

> You’re literally just using three different slot machines and claiming one is hot.

It's a fair point. I haven't tested many queries across them all and checked their answers, but if I want to ask one of them a question - right now its Grok just because I trust its answers more.


Replies

ToucanLoucantoday at 1:53 PM

It's not a methodology problem, it's a test-ability problem. LLMs are not deterministic. You can ask the same question to the same LLM five times and you'll likely get at least 3 answers.

Again. Slot machine.

show 1 reply