logoalt Hacker News

ranger_dangeryesterday at 10:12 PM3 repliesview on HN

Just FYI this article seems to define "start of the art" as "popular", as measured by "total mentions and user sentiment", without any bearing on the technical abilities or actual usage of the model.


Replies

yunusabdyesterday at 11:04 PM

Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.

mellosoulsyesterday at 10:24 PM

That's pretty much exactly what the title says.

The technical abilities and usage are derived from the commenters usage reflections.

swyxtoday at 9:07 AM

and assuming all mentions are coding model mentions just because its on hn