> This is very different from Gemma 4, where the 26B-A4B model is much worse on several benchmark...

zozbot234 • yesterday at 9:19 PM • 1 reply • view on HN

> This is very different from Gemma 4, where the 26B-A4B model is much worse on several benchmarks (e.g. Codeforces, HLE) than 31B.

Wouldn't you totally expect that, since 26A4B is lower on both total and active params? The more sensible comparison would pit Qwen 27B against Gemma 31B and Gemma 26A4B against Qwen 35A3B.

Replies

Hugsun • today at 11:49 AM

They're comparing Qwen's moe vs dense (smaller difference) against Gemma's moe vs dense (bigger difference). Your proposed alternative misses the point.

➕ show 1 reply

alt Hacker News

Replies