If you're okay with sonnet level performance, this sounds like a straight upgrade. But I find t...

_345 • yesterday at 11:02 PM • 7 replies • view on HN

If you're okay with sonnet level performance, this sounds like a straight upgrade. But I find that sonnet messes up too much, that it ends up not being worth cost optimizing down to using it or another sonnet-level model. Glad to have this as an option though

Replies

2ndorderthought • yesterday at 11:05 PM

A lot of people are having good experiences doing things like using opus for designing and using locally hosted qwen3.6 for implementation.

I could see a serious cost reduction story by using opus for design and deepseek for implementation.

Personally I would avoid anthropic entirely. But I get why people don't.

➕ show 1 reply

chrsw • yesterday at 11:52 PM

I keep re-learning this lesson: I chug along with a lesser model then throw a problem at it that's too complex. Then I try different models until I give up and bring in Opus 4.6 to clean up.

➕ show 2 replies

maxdo • today at 2:53 AM

This is the problem: you need the best model, not just a good one, for: - Good architecture, which requires reading specs, code, etc. reads like: lots of tokens in/out - Bug fixing — same, plus logs, e.g. datadog

Once you've found the path, patches are trivial and the savings are tiny unless you're doing refactoring/cleanup.

testing gets more and more complicated. Take a look at opencode go, and you see this:

>Includes GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo->V2.5-Pro, MiMo-V2.5, Qwen3.5 Plus, Qwen3.6 Plus, MiniMax M2.5, MiniMax M2.7, >DeepSeek V4 Pro, and DeepSeek V4 Flash

and now on your own with bugs, all of these models can produce at scale. Am i missing anything in this picture. What is the real use of cheaper models?

➕ show 1 reply

Culonavirus • today at 5:06 AM

We're not yet at a point of saturation when all the frontier models would be of somewhat comparable "intelligence" and we could decide which to use based on other factors (speed, effective context window etc.), so I honestly don't see why would you (as a company or an employee) not use the best available model with the highest (or at least second highest) thinking effort. The fees are not exactly cheap, but not that expensive either.

➕ show 1 reply

mohsen1 • today at 6:40 AM

This has been my experience working on tsz.dev. Only Opus 4.7 and GPT 5.5 can really be productive for the remaining test cases.

willio58 • today at 12:17 AM

I don’t find this with sonnet at all. As long as I have a solid Claude.md and periodically review the output and enforce good code practices via basic CI gates I’ve rarely ever found myself having to switch to opus

➕ show 1 reply

alt Hacker News

Replies