Something is odd with this model, their blog posts shows REALLY good results, but in most other thir...

XCSme • today at 11:46 AM • 1 reply • view on HN

Something is odd with this model, their blog posts shows REALLY good results, but in most other third-party benchmarks, people realize it's not really SOTA, even bellow Kimi K2.6 and GLM-5/5.1

In my tests too[0], it doesn't reach top 10. One issue, which they also mentioned in their post, is that they can't really serve well the model at the moment, so V4-Pro is heavily rate-limited and gives a lot of timeout errors when I try to test it. This shouldn't be an issue though, considering the model is open-source, but it makes it hard to accurately test at the moment.

[0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...

Replies

dannyw • today at 12:20 PM

Hmm, the Flash performs significantly better than Pro in the benchmark? That's very strange; could rate limiting cause that?

➕ show 1 reply

alt Hacker News

Replies