Flat wrong. Q6 Gemma 31b feels a lot like opus 4.5 to me when run in a harness so it can retrieve information and ground itself. The gap is not that big for a lot of usecases. Qwen MoE is fast as fuck locally for things that are oneshottable. I have subscriptions to all the major providers right now and since Gemma 4 and Qwen 3.6 came out I haven't hit limits a single time. I'm actually super surprised by the number of things I try with Gemma 4 with the intent of seeing how it fails and then having Claude do it only to come away with something perfectly usable from the local model.
What harness are you using ?
I'm going to switch to local LLMs for most stuff soon.
Sorry but you're just seeing what you want to see. The idea that a 31b model is anywhere even in the ballpark of something like Opus 4.5 is just absurd on its face.
I’m guessing Qwen3.6 for agentic coding and Gemma4 for non-coding stuff?
[flagged]
Your n=1 might not be very relevant outside your personal use. In less contaminated benchmarks Gemma 4 is way below Sonnet 4.5, let alone Opus models: https://swe-rebench.com/