I have a 128 GB Strix Halo tablet (same as the other commenter here with the Framework Desktop). I&#...

the_pwner224 • today at 11:50 AM • 2 replies • view on HN

I have a 128 GB Strix Halo tablet (same as the other commenter here with the Framework Desktop). I'm using the larger Gemma 4 26B-A4B model (only 28 GB @ Q8) and it's been working great and runs very fast.

It's a 100% replacement for free ChatGPT/Gemini.

Compared to the paid pro/thinking models... Gemma does have reasoning, and I have used the reasoning mode for some tax & legal/accounting advice recently as well as other misc problems. It's worked well for that, but I haven't tried any real difficult tasks. From what I've heard re. agentic coding, the open weight models are ~18-24 months behind Anthropic & Google's SOTA.

Qwen 3.5 122B-A10B should just fit into 128 GB with a Q4/5 and may be a bit smarter. There's apparently also a similar sized Gemma 4 model but they haven't released it yet, the 26B was the largest released.

Replies

veunes • today at 3:58 PM

Sure, 26B models on beefy desktop silicon are finally nipping at the heels of commercial APIs, but this is a mobile thread. On a phone with 8GB of RAM and passive cooling, your tokens per second (t/s) are going to fall off a cliff after the first minute of sustained compute

zozbot234 • today at 12:15 PM

There's a 31B dense model in the Gemma 4 series that's obviously going to be smarter (though a whole lot slower) than the MoE 26A4B.

➕ show 1 reply

alt Hacker News

Replies