Claude Sonnet 5

709 points • by marinesebastian • today at 5:59 PM • 385 comments • view on HN

Comments

The jump in reasoning quality is noticeable. What's interesting is how it handles ambiguous instructions now — it seems to ask fewer clarifying questions and just makes a reasonable judgment call. That's a double-edged sword depending on your use case.

mellosty • today at 6:58 PM

Sonnet seems to be really expensive

➕ show 1 reply

baalimago • today at 6:49 PM

Not looking great for an upcoming IPO

➕ show 1 reply

benjiro29 • today at 6:21 PM

Anybody notice that they did not include Sonnet 5 Max in the "Agentic Search results", when comparing to Opus 4.8 ...

Based upon the "Agentic Computer usage", Sonnet 5 Max was going to be off "Agentic Search results" chart. lol ...

In short, Sonnet 5 Low/Medium is more cost efficient, if its a task below Opus 4.8 Medium. For the rest its expensive and your better off using Opus 4.8.

Why even release this model?

➕ show 2 replies

mellosty • today at 6:27 PM

It does not pass the "I want to wash my car, should I drive or walk"

➕ show 2 replies

smallerfish • today at 6:20 PM

Ah that's why Opus has been so slow for the last couple of days.

prmph • today at 7:38 PM

So many things to think about regarding these "benchmarks":

- Do the ever increasing scores on the mean we will soon have models that approach 100%? And what would that even mean? That there is no more room for improvement?

- Would Anthropic (or any other model vendor for that matter) ever release a newer model that scores lower? If not, does that mean they keep tweaking a new model they want to release until it shows an improvement of the prior model?

- Would it be more useful to move toward a comparative rather than absolute ranking?

guelo • today at 8:46 PM

Have they ever said what the difference is between Sonnet and Opus? Are they trained differently? Different architectures? Is Sonnet a distillation? Is it just that Sonnet has less resources for inference?

None of the other labs are doing this kind of long lived two model series.

➕ show 1 reply

artursapek • today at 8:41 PM

I run a proofreading benchmark that tests how well models can find and fix errors in English text. They get several passes in a simple agent loop. Sonnet 5 is definitely better than Sonnet 4.6, but inferior on both quality and cost to GLM 5.1, GLM 5.2, Gemini 3.1 Flash, and Gemini 3.1 Pro. https://revise.io/errata-bench

ai_fry_ur_brain • today at 8:38 PM

Finally a model release where everyone is realising the scam. The world is healing (maybe).

joaohaas • today at 7:38 PM

Important to note that the cost graphs are heavily distorted. The agentic serch one for example is divided into 3 'columns': $0-$2, $2-$5 and $5-$10.

And yet, the $2-$5 section is the widest, even though it only contains a single point.

I can't even say if this is making the product look better or not, but it sure is weird. Maybe Claude just hallucinated those splits xD

tensegrist • today at 6:01 PM

there was a vibecoded prediction market–style page that was put up yesterday (?) that got the date exactly right i think

➕ show 1 reply

PeterStuer • today at 7:20 PM

Anyone else feel like Opus 4.8 got significantly dumber over the last 2 weeks?

kvetching • today at 9:40 PM

GLM 5.2 is better and cheaper. Maybe they are trying to embarrass Trump by making it look like we are losing to China.

Scroll_Swe • today at 6:10 PM

I don't pay so I'm glad for the upgrade. I usually use Gemini, Mistral Le Chat (Vibe...) or Deepseek as they have way more generous free limits and I can basically spam forever.

docheinestages • today at 6:23 PM

Is it just me or is there a huge difference between how much one can accomplish in a 5-hour window with GPT 5.5 on xhigh versus any Claude model?

➕ show 1 reply

jchw • today at 6:16 PM

American AI company status: We are now bragging about how bad our models are unironically.

Okay.

_pdp_ • today at 7:05 PM

Too expensive?

gverrilla • today at 6:45 PM

Is this the default model for non-paying users? If so, that could be an interesting move in the competition for this segment.

andrewchambers • today at 7:40 PM

The whole fable fiasco really soured me on Anthropic. This just looks disappointing by comparison.

ekjhgkejhgk • today at 6:31 PM

In effective terms they're lowering prices.

micromacrofoot • today at 6:26 PM

So they repackaged Fable and added "don't scare the government" to the prompt

➕ show 1 reply

Getchowned • today at 6:51 PM

Fable soon please.

moomin • today at 6:15 PM

I feel like this is a bit of a disappointment. Sonnet 4 was a clear step above Opus 3.x, while this is a lot muddier.

mesmertech • today at 6:08 PM

Ok thats a one month clock to the next Opus model at least, so thats a silver lining to a meh model.

stackedinserter • today at 6:29 PM

"Our new model is proudly dumber now!"

➕ show 1 reply

varispeed • today at 7:41 PM

What is the point if it is one Trump's brain fart away from being blocked?

Danii27 • today at 8:11 PM

[flagged]

justicehunter • today at 6:10 PM

[dead]

aykutseker • today at 6:57 PM

[dead]

ricardobeat • today at 6:23 PM

[dead]

lucynight • today at 6:13 PM

AMAZING

alt Hacker News

Claude Sonnet 5

Comments