logoalt Hacker News

robertkarltoday at 6:23 PM0 repliesview on HN

You can trade off latency / accuracy / cost for any ML task. And with the local models.... the cost is free.

Having a local Qwen check another Qwen's work increases the accuracy quite a bit at the cost of more latency. You can't have your cake and eat it too.

In benchmarking local models, I'm having success increasing even a 9B qwen's score on terminal-bench adjacent problems, just by asking it to plan and handing the plan back to qwen with a fresh context. Try it with Qwen3.5, unsloth Q4+, and a thinking budget of around 1024 tokens.