They're comparing Qwen's moe vs dense (smaller difference) against Gemma's moe vs dense (bigger difference). Your proposed alternative misses the point.
Gemma's dense is bigger than its moe's total parameters. You could totally expect the moe to do terribly by comparison.
Gemma's dense is bigger than its moe's total parameters. You could totally expect the moe to do terribly by comparison.