logoalt Hacker News

GodelNumberingyesterday at 9:54 PM1 replyview on HN

True for Kimi, but the results I published are average across the models (CF has over 10 models on openrouter). Your current Kimi K2.6 is over 80% but Gemma 4 26B A4B is 0%. https://openrouter.ai/google/gemma-4-26b-a4b-it

This is also the reason providers like Anthropic scored lower because while Opus 4.7 is close to 90%, Opus 4.5 is 45%


Replies

kflansburgtoday at 4:03 AM

My point was not about our ranking specifically, but the methodology of taking a point-in-time sample.