logoalt Hacker News

Claude Sonnet 5

709 pointsby marinesebastiantoday at 5:59 PM385 commentsview on HN

Comments

docprooftoday at 7:04 PM

The jump in reasoning quality is noticeable. What's interesting is how it handles ambiguous instructions now — it seems to ask fewer clarifying questions and just makes a reasonable judgment call. That's a double-edged sword depending on your use case.

mellostytoday at 6:58 PM

Sonnet seems to be really expensive

show 1 reply
baalimagotoday at 6:49 PM

Not looking great for an upcoming IPO

show 1 reply
benjiro29today at 6:21 PM

Anybody notice that they did not include Sonnet 5 Max in the "Agentic Search results", when comparing to Opus 4.8 ...

Based upon the "Agentic Computer usage", Sonnet 5 Max was going to be off "Agentic Search results" chart. lol ...

In short, Sonnet 5 Low/Medium is more cost efficient, if its a task below Opus 4.8 Medium. For the rest its expensive and your better off using Opus 4.8.

Why even release this model?

show 2 replies
mellostytoday at 6:27 PM

It does not pass the "I want to wash my car, should I drive or walk"

show 2 replies
smallerfishtoday at 6:20 PM

Ah that's why Opus has been so slow for the last couple of days.

prmphtoday at 7:38 PM

So many things to think about regarding these "benchmarks":

- Do the ever increasing scores on the mean we will soon have models that approach 100%? And what would that even mean? That there is no more room for improvement?

- Would Anthropic (or any other model vendor for that matter) ever release a newer model that scores lower? If not, does that mean they keep tweaking a new model they want to release until it shows an improvement of the prior model?

- Would it be more useful to move toward a comparative rather than absolute ranking?

guelotoday at 8:46 PM

Have they ever said what the difference is between Sonnet and Opus? Are they trained differently? Different architectures? Is Sonnet a distillation? Is it just that Sonnet has less resources for inference?

None of the other labs are doing this kind of long lived two model series.

show 1 reply
artursapektoday at 8:41 PM

I run a proofreading benchmark that tests how well models can find and fix errors in English text. They get several passes in a simple agent loop. Sonnet 5 is definitely better than Sonnet 4.6, but inferior on both quality and cost to GLM 5.1, GLM 5.2, Gemini 3.1 Flash, and Gemini 3.1 Pro. https://revise.io/errata-bench

ai_fry_ur_braintoday at 8:38 PM

Finally a model release where everyone is realising the scam. The world is healing (maybe).

joaohaastoday at 7:38 PM

Important to note that the cost graphs are heavily distorted. The agentic serch one for example is divided into 3 'columns': $0-$2, $2-$5 and $5-$10.

And yet, the $2-$5 section is the widest, even though it only contains a single point.

I can't even say if this is making the product look better or not, but it sure is weird. Maybe Claude just hallucinated those splits xD

tensegristtoday at 6:01 PM

there was a vibecoded prediction market–style page that was put up yesterday (?) that got the date exactly right i think

show 1 reply
PeterStuertoday at 7:20 PM

Anyone else feel like Opus 4.8 got significantly dumber over the last 2 weeks?

kvetchingtoday at 9:40 PM

GLM 5.2 is better and cheaper. Maybe they are trying to embarrass Trump by making it look like we are losing to China.

Scroll_Swetoday at 6:10 PM

I don't pay so I'm glad for the upgrade. I usually use Gemini, Mistral Le Chat (Vibe...) or Deepseek as they have way more generous free limits and I can basically spam forever.

docheinestagestoday at 6:23 PM

Is it just me or is there a huge difference between how much one can accomplish in a 5-hour window with GPT 5.5 on xhigh versus any Claude model?

show 1 reply
jchwtoday at 6:16 PM

American AI company status: We are now bragging about how bad our models are unironically.

Okay.

_pdp_today at 7:05 PM

Too expensive?

gverrillatoday at 6:45 PM

Is this the default model for non-paying users? If so, that could be an interesting move in the competition for this segment.

andrewchamberstoday at 7:40 PM

The whole fable fiasco really soured me on Anthropic. This just looks disappointing by comparison.

ekjhgkejhgktoday at 6:31 PM

In effective terms they're lowering prices.

micromacrofoottoday at 6:26 PM

So they repackaged Fable and added "don't scare the government" to the prompt

show 1 reply
Getchownedtoday at 6:51 PM

Fable soon please.

moomintoday at 6:15 PM

I feel like this is a bit of a disappointment. Sonnet 4 was a clear step above Opus 3.x, while this is a lot muddier.

mesmertechtoday at 6:08 PM

Ok thats a one month clock to the next Opus model at least, so thats a silver lining to a meh model.

stackedinsertertoday at 6:29 PM

"Our new model is proudly dumber now!"

show 1 reply
varispeedtoday at 7:41 PM

What is the point if it is one Trump's brain fart away from being blocked?

Danii27today at 8:11 PM

[flagged]

justicehuntertoday at 6:10 PM

[dead]

aykutsekertoday at 6:57 PM

[dead]

ricardobeattoday at 6:23 PM

[dead]

lucynighttoday at 6:13 PM

AMAZING