logoalt Hacker News

Mirasteyesterday at 2:51 PM3 repliesview on HN

What? 35B-A3B is not nearly as smart as 27B.


Replies

stratos123yesterday at 8:50 PM

One interesting thing about Qwen3 is that looking at the benchmarks, the 35B-A3B models seem to be only a bit worse than the dense 27B ones. This is very different from Gemma 4, where the 26B-A4B model is much worse on several benchmarks (e.g. Codeforces, HLE) than 31B.

show 1 reply
ekianjoyesterday at 2:56 PM

yeah the 27B feels like something completely different. If you use it on long context tasks it performs WAY better than 35b-a3b

show 1 reply
zkmonyesterday at 2:55 PM

Yes.