I don't see it talked about much, but Gemma (and gemini) use enormously less tokens per output ...

WarmWash • yesterday at 6:27 PM • 10 replies • view on HN

I don't see it talked about much, but Gemma (and gemini) use enormously less tokens per output than other models, while still staying within arms reach of top benchmark performance.

It's not uncommon to see a gemma vs qwen comparison, where qwen does a bit better, but spent 22 minutes on the task, while gemma aligned the buttons wrong, but only spent 4 minutes on the same prompt. So taken at face value, gemma is now under performing leading open models by 5-10%, but doing it in 1/10th the time.

Replies

rjh29 • yesterday at 6:33 PM

Anecdotally the 15/month basic Gemini plan allows coding all day. I'm not hitting the limits or needing to upgrade to 100/month plans like other people are doing with Claude or Codex.

Caveat: Gemini has been dumbed down a few times over the last year. Rate limits tightened up too. So it might not be this good in the future.

➕ show 13 replies

mnicky • yesterday at 11:36 PM

In the Dwarkesh's podcast Dylan Patel from SemiAnalysis said that Google can currently afford to have larger models than competitors, because of access to much more compute, TPUs etc.

That could explain the token usage difference because larger models usually use less tokens per the same unit of intelligence.

xnx • yesterday at 10:33 PM

Claude is very fashionable right now, but I've never had any problems or felt the need to switch.

Maybe after Google I/O, more people will catch on to how good it is.

gertlabs • today at 5:49 AM

This is true, we have the numbers to back it up on https://gertlabs.com/rankings?mode=oneshot_coding (check out the efficiency chart too)

GPT 5.5/5.4 are the smartest models, but at great token / code bloat cost. Qwen 3.6 Max strikes a good balance. But Gemma 4 26B writes some really efficient code, with great results considering the model size. Things do start falling apart under higher contexts.

amunozo • today at 9:00 AM

Gemini models, even if not so good at coding, are also competitive with GPT-5.5 and Claude Opus 4.7 in a lot of tasks while having considerably less parameters.

Urahandystar • yesterday at 6:33 PM

True, but you have to add up the cumulative token output if your being fair. That alignment issue requires another set of input and output tokens to correct.

➕ show 1 reply

prodigycorp • today at 4:11 AM

I think you can see this one of two ways: you could also consider it a miracle that the qwen models are able to perform so well when being trained on inefficient wrapper code data.

mcv • yesterday at 9:50 PM

One of the consequences of Gemma's speed is that you can run it on a GPU that's technically too small for it. I've run it on my 4070, and while the output wasn't blazingly fast, it was usable. (Though I haven't used it for anything complex yet. I'm sure that will be different.)

dbreunig • yesterday at 10:19 PM

Among benchmarkers its a frequent topic. Qwen BURNS reasoning to get its scores.

m3kw9 • today at 4:18 AM

it won't really do much if you try to code with it. i plugged it into xcode and it failed to change a variable.

alt Hacker News

Replies