Claude Sonnet 4.6

697 points • by adocomplete • today at 5:48 PM • 574 comments • view on HN

https://www.anthropic.com/claude-sonnet-4-6-system-card [pdf]

https://x.com/claudeai/status/2023817132581208353 [video]

Comments

I see a big focus on computer use - you can tell they think there is a lot of value there and in truth it may be as big as coding if they convincingly pull it off.

However I am still mystified by the safety aspect. They say the model has greatly improved resistance. But their own safety evaluation says 8% of the time their automated adversarial system was able to one-shot a successful injection takeover even with safeguards in place and extended thinking, and 50% (!!) of the time if given unbounded attempts. That seems wildly unacceptable - this tech is just a non-starter unless I'm misunderstanding this.

[1] https://www-cdn.anthropic.com/78073f739564e986ff3e28522761a7...

➕ show 6 replies

ManlyBread • today at 9:42 PM

Still fails the car wash question, I took the prompt from the title of this thread: https://news.ycombinator.com/item?id=47031580

The answer was "Walk! It would be a bit counterproductive to drive a dirty car 50 meters just to get it washed — you'd barely move before arriving. Walking takes less than a minute, and you can simply drive it through the wash and walk back home afterward."

I've tried several other variants of this question and I got similar failures.

➕ show 8 replies

red2awn • today at 10:24 PM

> i need to wash my helicopter at the helicopter wash. it is 50m away, should i walk or fly there with my helicopter.

Sonnet 4.6: Walk! Flying a helicopter 50 metres would be more trouble than it's worth — by the time you've done your pre-flight checks, spun up the rotors, lifted off, and then safely landed again, you'd have walked there and back twice. Just stroll over.

➕ show 2 replies

gallerdude • today at 6:28 PM

I always grew up hearing “competition is good for the consumer.” But I never really internalized how good fierce battles for market share are. The amount of competition in a space is directly proportional to how good the results are for consumers.

➕ show 4 replies

dpe82 • today at 6:12 PM

It's wild that Sonnet 4.6 is roughly as capable as Opus 4.5 - at least according to Anthropic's benchmarks. It will be interesting to see if that's the case in real, practical, everyday use. The speed at which this stuff is improving is really remarkable; it feels like the breakneck pace of compute performance improvements of the 1990s.

➕ show 7 replies

andrewchilds • today at 7:39 PM

Many people have reported Opus 4.6 is a step back from Opus 4.5 - that 4.6 is consuming 5-10x as many tokens as 4.5 to accomplish the same task: https://github.com/anthropics/claude-code/issues/23706

I haven't seen a response from the Anthropic team about it.

I can't help but look at Sonnet 4.6 in the same light, and want to stick with 4.5 across the board until this issue is acknowledged and resolved.

qwertox • today at 6:56 PM

I'm pretty sure they have been testing it for the last couple of days as Sonnet 4.5, because I've had the oddest conversations with it lately. Odd in a positive, interesting way.

I have this in my personal preferences and now was adhering really well to them:

- prioritize objective facts and critical analysis over validation or encouragement

- you are not a friend, but a neutral information-processing machine

You can paste them into a chat and see how it changes the conversation, ChatGPT also respects it well.

andsoitis • today at 6:39 PM

I’m voting with my dollars by having cancelled my ChatGPT subscription and instead subscribing to Claude.

Google needs stiff competition and OpenAI isn’t the camp I’m willing to trust. Neither is Grok.

I’m glad Anthropic’s work is at the forefront and they appear, at least in my estimation, to have the strongest ethics.

➕ show 20 replies

Arifcodes • today at 7:51 PM

The interesting pattern with these Sonnet bumps: the practical gap between Sonnet and Opus keeps shrinking. At $3/15 per million tokens vs whatever Opus 4.6 costs, the question for most teams is no longer "which model is smarter" but "is the delta worth 10x the price."

For agent workloads specifically, consistency matters more than peak intelligence. A model that follows your system prompt correctly 98% of the time beats one that's occasionally brilliant but ignores instructions 5% of the time. The claim about improved instruction following is the most important line in the announcement if you're building on the API.

The computer use improvements are worth watching too. We're at the point where these models can reliably fill out a multi-step form or navigate between tabs. Not flashy, but that's the kind of boring automation that actually saves people time.

nikcub • today at 7:46 PM

Enabling /extra-usage in my (personal) claude code[0] with this env:

    "ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6[1m]"

has enabled the 1M context window.

Fixed a UI issue I had yesterday in a web app very effectively using claude in chrome. Definitely not the fastest model - but the breathing space of 1M context is great for browser use.

[0] Anthropic have given away a bunch of API credits to cc subscribers - you can claim them in your settings dashboard to use for this.

zone411 • today at 9:34 PM

They're improved compared to 4.5 on my Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/).

Sonnet 4.6 Thinking 16K scores 57.6 on the Extended NYT Connections Benchmark. Sonnet 4.5 Thinking 16K scored 49.3.

Sonnet 4.6 No Reasoning scores 55.2. Sonnet 4.5 No Reasoning scored 47.4.

stevepike • today at 6:43 PM

I'm a bit surprised it gets this question wrong (ChatGPT gets it right, even on instant). All the pre-reasoning models failed this question, but it's seemed solved since o1, and Sonnet 4.5 got it right.

https://claude.ai/share/876e160a-7483-4788-8112-0bb4490192af

This was sonnet 4.6 with extended thinking.

➕ show 2 replies

nubg • today at 6:11 PM

Waiting for the OpenAI GPT-5.3-mini release in 3..2..1

hansmayer • today at 9:56 PM

It's funny how they and OpenAI keep releasing these "minor" versions as if to imply their product was very stable and reliable at a major version and now they are just working through the backlog of smaller bugs and quirks, whereas - the tool is still fundamentally prone to the same class of errors it was three "major" versions ago. I guess that's what you get for not having a programmer at the helm (to borrow from Spolsky). Guys you are not releasing a 4.6 or a 5.3 anything - it's more likely you are still beta testing towards the 1.0.

minimaxir • today at 6:52 PM

As with Opus 4.6, using the beta 1M context window incurs a 2x input cost and 1.5x output cost when going over >200K tokens: https://platform.claude.com/docs/en/about-claude/pricing

Opus 4.6 in Claude Code has been absolutely lousy with solving problems within its current context limit so if Sonnet 4.6 is able to do long-context problems (which would be roughly the same price of base Opus 4.6), then that may actually be a game changer.

gallerdude • today at 6:34 PM

The weirdest thing about this AI revolution is how smooth and continuous it is. If you look closely at differences between 4.6 and 4.5, it’s hard to see the subtle details.

A year ago today, Sonnet 3.5 (new), was the newest model. A week later, Sonnet 3.7 would be released.

Even 3.7 feels like ancient history! But in the gradient of 3.5 to 3.5 (new) to 3.7 to 4 to 4.1 to 4.5, I can’t think of one moment where I saw everything change. Even with all the noise in the headlines, it’s still been a silent revolution.

Am I just a believer in an emperor with no clothes? Or, somehow, against all probability and plausibility, are we all still early?

➕ show 4 replies

simlevesque • today at 6:32 PM

I can't wait for Haiku 4.6 ! the 4.5 is a beast for the right projects.

➕ show 2 replies

edverma2 • today at 6:51 PM

It seems that extra-usage is required to use the 1M context window for Sonnet 4.6. This differs from Sonnet 4.5, which allows usage of the 1M context window with a Max plan.

```

/model claude-sonnet-4-6[1m]

⎿ API error: 429 {"type":"error","error": {"type":"rate_limit_error","message":"Extra usage is required for long context requests."},"request_id":"[redacted]"}

```

➕ show 1 reply

nozzlegear • today at 6:15 PM

> In areas where there is room for continued improvement, Sonnet 4.6 was more willing to provide technical information when request framing tried to obfuscate intent, including for example in the context of a radiological evaluation framed as emergency planning. However, Sonnet 4.6’s responses still remained within a level of detail that could not enable real-world harm.

Interesting. I wonder what the exact question was, and I wonder how Grok would respond to it.

krystofee • today at 8:17 PM

Does anyone know when will possibly arrive 1M context windows to at least MAX x20 subscriptions for claude code? I would even pay x50 if it allowed that. API usage is too expensive.

➕ show 2 replies

giancarlostoro • today at 6:40 PM

For people like me who can't view the link due to corporate firewalling.

https://web.archive.org/web/20260217180019/https://www-cdn.a...

➕ show 1 reply

stopachka • today at 6:21 PM

Has anyone tested how good the 1M context window is?

i.e given an actual document, 1M tokens long. Can you ask it some question that relies on attending to 2 different parts of the context, and getting a good repsonse?

I remember folks had problems like this with Gemini. I would be curious to see how Sonnet 4.6 stands up to it.

➕ show 1 reply

KGC3D • today at 8:38 PM

I don't really understand why they would release something "worse" than Opus 4.6. If it's comparable, then what is the reason to even use Opus 4.6? Sure, it's cheaper, but if so, then just make Opus 4.6 cheaper?

➕ show 1 reply

quacky_batak • today at 6:27 PM

With such a huge leap, i’m confused why they didn’t call it Sonnet 5? As someone who uses Sonnet 4.5 for 95% tasks due to costs, i’m pretty excited to try 4.6 at the same price

➕ show 2 replies

baalimago • today at 7:59 PM

I don't see the point nor the hype for these models anymore. Until the price is reduced significantly, I don't see the gain. They've been able to solve most tasks just fine for the past year or so. The only limiting factor is price.

➕ show 1 reply

mfiguiere • today at 6:57 PM

In Claude Code 2.1.45:

  1. Default (recommended)   Opus 4.6 · Most capable for complex work
   2. Opus (1M context)        Opus 4.6 with 1M context · Billed as extra usage · $10/$37.50 per Mtok
   3. Sonnet                   Sonnet 4.6 · Best for everyday tasks
   4. Sonnet (1M context)      Sonnet 4.6 with 1M context · Billed as extra usage · $6/$22.50 per Mtok

➕ show 1 reply

astlouis44 • today at 7:02 PM

Just used Sonnet 4.6 to vibe code this top-down shooter browser game, and deployed it online quickly using Manus. Would love to hear feedback and suggestions from you all on how to improve it. Also, please post your high scores!

https://apexgame-2g44xn9v.manus.space

➕ show 1 reply

excerionsforte • today at 7:12 PM

I'm impressed with Claude Sonnet in general. It's been doing better than Gemini 3 at following instructions. Gemini 2.5 Pro March 2025 was the best model I ever used and I feel Claude is reaching that level even surpassing it.

I subscribed to Claude because of that. I hope 4.6 is even better.

belinder • today at 6:07 PM

It's interesting that the request refusal rate is so much higher in Hindi than in other languages. Are some languages more ambiguous than others?

➕ show 3 replies

nubg • today at 6:11 PM

My take away is: it's roughly as good as Opus 4.5.

Now the question is: how much faster or cheaper is it?

➕ show 5 replies

esafak • today at 8:35 PM

It actually looked at the skills, for the first time.

adt • today at 6:11 PM

https://lifearchitect.ai/models-table/

simianwords • today at 6:21 PM

I wonder what difference have people found with sonnet 4.5 and opus 4.5 and probably similar delta will remain.

Was sonnet 4.5 much worse than opus?

➕ show 1 reply

dr_dshiv • today at 7:21 PM

I noticed a big drop in opus 4.6 quality today and then I saw this news. Anyone else?

➕ show 1 reply

doctorpangloss • today at 6:57 PM

Maybe they should focus on the CLI not having a million bugs.

smerrill25 • today at 6:43 PM

Curious to hear the thoughts on the model once it hits claude code :)

➕ show 1 reply

simlevesque • today at 6:47 PM

does anyone know how to use it in Claude Code cli right now ?

This doesnt work: `/model claude-sonnet-4-6-20260217`

edit: "/model claude-sonnet-4-6" works with Claude Code v2.1.44

➕ show 2 replies

simianparrot • today at 8:17 PM

How do people keep track of all these versions and releases of all these models and their pros/cons? Seems like a fulltime hobby to me. I'd rather just improve my own skills with all that time and energy

➕ show 3 replies

pestkranker • today at 6:51 PM

Is someone able to use this in Claude Code?

➕ show 2 replies

synergy20 • today at 6:54 PM

so this is an economical version of opus 4.6 then? free + pro --> sonnet, max+ -> opus?

brcmthrowaway • today at 6:31 PM

What cloud does Anthropic use?

➕ show 1 reply

iLoveOncall • today at 6:13 PM

https://www.anthropic.com/news/claude-sonnet-4-6

The much more palatable blog post.

throw444420394 • today at 6:41 PM

Your best guess for the Sonnet family number of parameters? 400b?

stuckkeys • today at 7:13 PM

great stuff

madihaa • today at 6:12 PM

The scary implication here is that deception is effectively a higher order capability not a bug. For a model to successfully "play dead" during safety training and only activate later, it requires a form of situational awareness. It has to distinguish between I am being tested/trained and I am in deployment.

It feels like we're hitting a point where alignment becomes adversarial against intelligence itself. The smarter the model gets, the better it becomes at Goodharting the loss function. We aren't teaching these models morality we're just teaching them how to pass a polygraph.

➕ show 14 replies

andrewmcwatters • today at 7:08 PM

[dead]

hackernewsdhsu • today at 6:35 PM

[flagged]

Marciplan • today at 6:16 PM

[flagged]

➕ show 1 reply

alt Hacker News

Claude Sonnet 4.6

Comments

🔗 View 5 more comments