I was getting dangerously close to my weekly Claude Code limit last night so I had Claude set up Qwen3.6 with llama.cpp and OpenCode. Honestly it's a great (free!) alternative to Claude Code--certainly more than good enough for a lot of smaller less complex tasks. I'm excited to try this new version. The fact that open-source models are so close to the frontier is very impressive.
Which exact model are you using? And with which parameters and quant? And on what hardware? Are you using any specific MCPs or other tools to optimize performance like context-mode or dynamic context pruning? I’ve used local models a reasonable amount before but I’m just starting out with opencode. Haven’t had great results yet but really want this to work for simpler tasks. My opencode newly installed is also having iterm on 100% cpu in idle. :/
Qwen Max are usually closed, unfortunately.
Do you have a feel for how it Qwen 3.6 compares to Sonnet 4.6? B/C in reality, that's what we use a lot. If we just use Opus 4.7 for everything code related, we'd have a monthly bill 10-20 times higher than using Sonnet where we can.
Qwen3.6 with claude code works great. I get a lot better results with that than opencode and qwen3.6. Claude Code is a great harness, and good harness/tool integration makes a big difference. You just have a settings.json with your ollama setup and the qwen model and you can use it.
As Opus maximalist ;) I was very surprised by the quality if Qwen3.6-27B - trying to figure out how to get it going on RTX 90k now to offload some lighter tasks :)
> Today we introduce Qwen3.7-Max, our latest proprietary model
This is not an open model
Which agentic coding tool and how do you make sure you have prefix consistency ?
This one doesnt seem to be open source though sadly. Using chinese servers is a step to far for me personally
Out of interest, what machine and model are you running it on?
I tried the qwen3.6-27b Q6_k GUFF in llama.cpp and LM Studio on my M2 MacBook Pro 32GB machine last week, and I barely get a token a second with either.
What sort of speed should I be expecting?
I tried some of the Llama 3 34b (nous-capybara?) models two years ago with llama.cpp, and I seem to remember getting a few tokens a second then, so not sure if I've got something completely mis-configured, or I just have unreasonable expectations.
Or maybe qwen 3.x is slower for some reason? (Is it mixture of experts?)
I'm not expecting it to be instant, but what I'm currently seeing is not really usable.