logoalt Hacker News

iLoveOncallyesterday at 6:19 PM1 replyview on HN

Given that users prefered it to Sonnet 4.5 "only" in 70% of the cases (according to their blog post) makes me highly doubt that this is representative of real-life usage. Benchmarks are just completely meaningless.


Replies

jwolfeyesterday at 6:34 PM

For cases where 4.5 already met the bar, I would expect 50% preference each way. This makes it kind of hard to make any sense of that number, without a bunch more details.

show 1 reply