logoalt Hacker News

yunusabdyesterday at 11:04 PM1 replyview on HN

Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.


Replies

miyojitoday at 12:32 PM

Maybe you shouldn't be relying on something if you can't even tell how good it is?