logoalt Hacker News

XCSmetoday at 11:46 AM1 replyview on HN

Something is odd with this model, their blog posts shows REALLY good results, but in most other third-party benchmarks, people realize it's not really SOTA, even bellow Kimi K2.6 and GLM-5/5.1

In my tests too[0], it doesn't reach top 10. One issue, which they also mentioned in their post, is that they can't really serve well the model at the moment, so V4-Pro is heavily rate-limited and gives a lot of timeout errors when I try to test it. This shouldn't be an issue though, considering the model is open-source, but it makes it hard to accurately test at the moment.

[0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...


Replies

dannywtoday at 12:20 PM

Hmm, the Flash performs significantly better than Pro in the benchmark? That's very strange; could rate limiting cause that?

show 1 reply