logoalt Hacker News

deepdarkforestlast Thursday at 7:08 PM1 replyview on HN

By competitive, i mean no.1 in LM arena overall, in webdev, in image gen, in grounding etc. Plus, leading the chatbot arena ELO. Flash is the most used model in openrouter this month as well. Gemma models are leading on device stats as well. So yes, competitive


Replies

oceanplexianyesterday at 4:10 AM

Except coding, where it’s essentially middle of the pack. Which is the only thing that you can build objective benchmarks around. The fact that people on LM arena prefer the output has no relationship to how intelligent the model actually is.