logoalt Hacker News

doctoboggantoday at 6:11 PM15 repliesview on HN

The cost per task chart is telling me that I should _never_ use Sonnet 5 above medium effort level - Opus always performs better for a given cost. So I guess the takeaway is that if Sonnet 5 medium isn't good enough for you, switch models, not effort levels.


Replies

2001zhaozhaotoday at 6:53 PM

There are two wrinkles to this:

- For Claude.ai subscriptions I think Sonnet is much cheaper than Opus. This is why there was a "Sonnet only" usage bar for Max tier for the longest time.

- For some tasks the sheer amount of raw input tokens is the most important. For example multimodal computer use tasks. You can't make them any more efficient on Opus by turning down the reasoning, so a cheaper model like Sonnet is useful for them

show 1 reply
AquinasCodertoday at 6:45 PM

While I appreciate, they publish this information, it's increasingly hard to keep track of it all. I've lost the mental model of how different models at different effort levels perform and what tasks they are good at.

In practice, I tend to just use the default on Claude Code that works well enough. But I wonder to what degree other users really play around with these settings to optimize for their project.

show 4 replies
Torkeltoday at 6:20 PM

Yeah, I was looking at the same chart and was very surprised at where the curve is relative to opus... Feels like sonnet 5 is "what if opus had an extra-low effort level"?

energy123today at 6:40 PM

The arguable caveat is Sonnet may run faster (although this isn't known for sure, due to more tokens being used for the same task), so you can potentially get more done in a synchronous iterative workflow

I don't really believe this however, because so much time is spent fixing up after models, that a slower but more intelligent model is a net time saver in my experience.

goldenarmtoday at 9:12 PM

It's funny the exact same thing happened to Gemini 3.5 flash. Cheaper and more agentic model that ends up worse and more expensive than 3.5 pro low.

johnfntoday at 6:38 PM

That's just one benchmark, though. Tab to the next one and Sonnet 5 performs better as effort goes up just as you'd expect. I imagine the suggestion is that performance vs effort tradeoff is task dependent.

show 1 reply
lucamarktoday at 7:26 PM

You're referring to the Agentic search, but if you look at the Agentic computer use the cost is basically halved.

However, I am also confused about market positioning. Too expensive to perform daily tasks - open souce models are much cheaper - and not frontier model to address complex real world problems.

Rarely used Sonnet btw.

show 1 reply
seirutoday at 7:14 PM

Worth noting that the default chart there is for "agentic search performance", not coding. I didn't see an effort comparison for coding specifically.

booitoday at 7:26 PM

i actually exclusively use Sonnet in low effort level. It's too slow otherwise and at a higher effort levels is strictly worse than Opus.

manojldstoday at 7:13 PM

Opus 4.8 high doing better and cheaper than Sonnet 5 xhigh

intellijddtoday at 6:43 PM

I noticed that as well but with the introductory pricing, I wonder how true that is.

It would be great to see these charts with the promotional pricing just because it’s here for about two whole months.

I guess I could get Sonnet 5 to do it.

al_borlandtoday at 7:16 PM

What is a "task" in real-world terms? If it will be $15/million output tokens, and high/xhigh is somewhere in the $7.50/task range. Does that mean a single task is using 500k tokens. That seems like it would start to add up fast.

show 1 reply
Natelinathantoday at 8:06 PM

I just re-wrote the /code-review skill anthropic ships to use Sonnet 4.6 for some tasks as it was using Opus for simple git diff commands and similarily mechanical tasks (launched 100+ agents for one of my diffs, cmon). I wonder how Sonnet 5 will impact my usage.

Does anyone else have any review token saving measures?

niccetoday at 7:27 PM

> Opus always performs better for a given cost.

Assume it to get deprecated sooner rather than later.

ZeWakatoday at 6:42 PM

It's very interesting. Why even release a new product that underperforms at the same price level? Why not just lock it?

I guess it's probably a lot cheaper for them to run, and it cuts costs for them. Seems disingenuous, though.