Yes, it's a "smaller" (137B) model that competes with Haiku, but it's basically the performance of Qwen3.6-35B-A3B which is 75% smaller and 98% smaller in terms of active parameters (since it's a mixture of experts model). Microsoft should be comparing its model to good smaller models, not Haiku 4.5.
Qwen-3.6-27b is closer to Claude Opus 4.7 than it is to Haiku 4.5 in a lot of benchmarks - and it's way smaller than Microsoft's new model.
Sure, it competes with Haiku, but it shows how far Microsoft is behind lots of other small models that are available.
I understand what you’re saying, but I am generally very careful when comparing models and their benchmarks; benchmarks often don’t really match “real world” quality.
[dead]
> 98% smaller in terms of active parameters (since it's a mixture of experts model).
I don’t think that’s right, this flash model is 5B active params. Qwen3.6-35B-A3B is 3B so 40% smaller.