I tried the new qwen model in Codex CLI and in Roo Code and I found it to be pretty bad. For instance I told it I wanted a new vite app and it just started writing all the files from scratch (which didn’t work) rather than using the vite CLI tool.
Is there a better agentic coding harness people are using for these models? Based on my experience I can definitely believe the claims that these models are overfit to Evals and not broadly capable.
Have frontier lab do the plan which is the most time consuming part anyways and then local llm do the implementation. Frontier model can orchestrate your tickets, write a plan for them and dispatch local llm agents to implement at about 180 tokens/s, vllm can probably ,manage something like 25 concurrent sessions on RTX 6000 Do it all in a worktrees and then have frontier model do the review and merge. I am just a retired hobbyist but that's my approach, I run everything through gitea issues, each issue gets launched by orchestrator in a new tmux window and two main agents (implementer and reviewer get their own panes so I can see what's going on). I think claude code now has this aspect also somewhat streamlined but I have seen no need to change up my approach yet since I am just a retired hobbyist tinkering on my personal projects. Also right now I just use claude code subagents but have been thinking of trying to replace them with some of these Qwen 3.5 models because they do seem cpable and I have the hardware to run them.
What is "the new qwen model"? There are a dozen and you can get them in a dozen different quantizations (or more) which are of different quality each.
In my experience Qwen3.5/Qwen3-Coder-Next perform best in their own harness, Qwen-Code. You can also crib the system prompt and tool definitions from there though. Though caveat, despite the Qwen models being the state of the art for local models they are like a year behind anything you can pay for commercially so asking for it to build a new app from scratch might be a bit much.
[dead]
I've noticed that open weight models tend to hesitate to use tools or commands unless they appeared often in the training or you tell them very explicitly to do so in your AGENTS.md or prompt.
They also struggle at translating very broad requirements to a set of steps that I find acceptable. Planning helps a lot.
Regarding the harness, I have no idea how much they differ but I seem to have more luck with https://pi.dev than OpenCode. I think the minimalism of Pi meshes better with the limited capabilities of open models.