logoalt Hacker News

Claude Sonnet 4.6

697 pointsby adocompletetoday at 5:48 PM574 commentsview on HN

https://www.anthropic.com/claude-sonnet-4-6-system-card [pdf]

https://x.com/claudeai/status/2023817132581208353 [video]


Comments

zmmmmmtoday at 9:13 PM

I see a big focus on computer use - you can tell they think there is a lot of value there and in truth it may be as big as coding if they convincingly pull it off.

However I am still mystified by the safety aspect. They say the model has greatly improved resistance. But their own safety evaluation says 8% of the time their automated adversarial system was able to one-shot a successful injection takeover even with safeguards in place and extended thinking, and 50% (!!) of the time if given unbounded attempts. That seems wildly unacceptable - this tech is just a non-starter unless I'm misunderstanding this.

[1] https://www-cdn.anthropic.com/78073f739564e986ff3e28522761a7...

show 6 replies
ManlyBreadtoday at 9:42 PM

Still fails the car wash question, I took the prompt from the title of this thread: https://news.ycombinator.com/item?id=47031580

The answer was "Walk! It would be a bit counterproductive to drive a dirty car 50 meters just to get it washed — you'd barely move before arriving. Walking takes less than a minute, and you can simply drive it through the wash and walk back home afterward."

I've tried several other variants of this question and I got similar failures.

show 8 replies
red2awntoday at 10:24 PM

> i need to wash my helicopter at the helicopter wash. it is 50m away, should i walk or fly there with my helicopter.

Sonnet 4.6: Walk! Flying a helicopter 50 metres would be more trouble than it's worth — by the time you've done your pre-flight checks, spun up the rotors, lifted off, and then safely landed again, you'd have walked there and back twice. Just stroll over.

show 2 replies
gallerdudetoday at 6:28 PM

I always grew up hearing “competition is good for the consumer.” But I never really internalized how good fierce battles for market share are. The amount of competition in a space is directly proportional to how good the results are for consumers.

show 4 replies
dpe82today at 6:12 PM

It's wild that Sonnet 4.6 is roughly as capable as Opus 4.5 - at least according to Anthropic's benchmarks. It will be interesting to see if that's the case in real, practical, everyday use. The speed at which this stuff is improving is really remarkable; it feels like the breakneck pace of compute performance improvements of the 1990s.

show 7 replies
andrewchildstoday at 7:39 PM

Many people have reported Opus 4.6 is a step back from Opus 4.5 - that 4.6 is consuming 5-10x as many tokens as 4.5 to accomplish the same task: https://github.com/anthropics/claude-code/issues/23706

I haven't seen a response from the Anthropic team about it.

I can't help but look at Sonnet 4.6 in the same light, and want to stick with 4.5 across the board until this issue is acknowledged and resolved.

qwertoxtoday at 6:56 PM

I'm pretty sure they have been testing it for the last couple of days as Sonnet 4.5, because I've had the oddest conversations with it lately. Odd in a positive, interesting way.

I have this in my personal preferences and now was adhering really well to them:

- prioritize objective facts and critical analysis over validation or encouragement

- you are not a friend, but a neutral information-processing machine

You can paste them into a chat and see how it changes the conversation, ChatGPT also respects it well.

andsoitistoday at 6:39 PM

I’m voting with my dollars by having cancelled my ChatGPT subscription and instead subscribing to Claude.

Google needs stiff competition and OpenAI isn’t the camp I’m willing to trust. Neither is Grok.

I’m glad Anthropic’s work is at the forefront and they appear, at least in my estimation, to have the strongest ethics.

show 20 replies
Arifcodestoday at 7:51 PM

The interesting pattern with these Sonnet bumps: the practical gap between Sonnet and Opus keeps shrinking. At $3/15 per million tokens vs whatever Opus 4.6 costs, the question for most teams is no longer "which model is smarter" but "is the delta worth 10x the price."

For agent workloads specifically, consistency matters more than peak intelligence. A model that follows your system prompt correctly 98% of the time beats one that's occasionally brilliant but ignores instructions 5% of the time. The claim about improved instruction following is the most important line in the announcement if you're building on the API.

The computer use improvements are worth watching too. We're at the point where these models can reliably fill out a multi-step form or navigate between tabs. Not flashy, but that's the kind of boring automation that actually saves people time.

nikcubtoday at 7:46 PM

Enabling /extra-usage in my (personal) claude code[0] with this env:

    "ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6[1m]"
has enabled the 1M context window.

Fixed a UI issue I had yesterday in a web app very effectively using claude in chrome. Definitely not the fastest model - but the breathing space of 1M context is great for browser use.

[0] Anthropic have given away a bunch of API credits to cc subscribers - you can claim them in your settings dashboard to use for this.

zone411today at 9:34 PM

They're improved compared to 4.5 on my Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/).

Sonnet 4.6 Thinking 16K scores 57.6 on the Extended NYT Connections Benchmark. Sonnet 4.5 Thinking 16K scored 49.3.

Sonnet 4.6 No Reasoning scores 55.2. Sonnet 4.5 No Reasoning scored 47.4.

stevepiketoday at 6:43 PM

I'm a bit surprised it gets this question wrong (ChatGPT gets it right, even on instant). All the pre-reasoning models failed this question, but it's seemed solved since o1, and Sonnet 4.5 got it right.

https://claude.ai/share/876e160a-7483-4788-8112-0bb4490192af

This was sonnet 4.6 with extended thinking.

show 2 replies
nubgtoday at 6:11 PM

Waiting for the OpenAI GPT-5.3-mini release in 3..2..1

hansmayertoday at 9:56 PM

It's funny how they and OpenAI keep releasing these "minor" versions as if to imply their product was very stable and reliable at a major version and now they are just working through the backlog of smaller bugs and quirks, whereas - the tool is still fundamentally prone to the same class of errors it was three "major" versions ago. I guess that's what you get for not having a programmer at the helm (to borrow from Spolsky). Guys you are not releasing a 4.6 or a 5.3 anything - it's more likely you are still beta testing towards the 1.0.

minimaxirtoday at 6:52 PM

As with Opus 4.6, using the beta 1M context window incurs a 2x input cost and 1.5x output cost when going over >200K tokens: https://platform.claude.com/docs/en/about-claude/pricing

Opus 4.6 in Claude Code has been absolutely lousy with solving problems within its current context limit so if Sonnet 4.6 is able to do long-context problems (which would be roughly the same price of base Opus 4.6), then that may actually be a game changer.

gallerdudetoday at 6:34 PM

The weirdest thing about this AI revolution is how smooth and continuous it is. If you look closely at differences between 4.6 and 4.5, it’s hard to see the subtle details.

A year ago today, Sonnet 3.5 (new), was the newest model. A week later, Sonnet 3.7 would be released.

Even 3.7 feels like ancient history! But in the gradient of 3.5 to 3.5 (new) to 3.7 to 4 to 4.1 to 4.5, I can’t think of one moment where I saw everything change. Even with all the noise in the headlines, it’s still been a silent revolution.

Am I just a believer in an emperor with no clothes? Or, somehow, against all probability and plausibility, are we all still early?

show 4 replies
simlevesquetoday at 6:32 PM

I can't wait for Haiku 4.6 ! the 4.5 is a beast for the right projects.

show 2 replies
edverma2today at 6:51 PM

It seems that extra-usage is required to use the 1M context window for Sonnet 4.6. This differs from Sonnet 4.5, which allows usage of the 1M context window with a Max plan.

```

/model claude-sonnet-4-6[1m]

⎿ API error: 429 {"type":"error","error": {"type":"rate_limit_error","message":"Extra usage is required for long context requests."},"request_id":"[redacted]"}

```

show 1 reply
nozzlegeartoday at 6:15 PM

> In areas where there is room for continued improvement, Sonnet 4.6 was more willing to provide technical information when request framing tried to obfuscate intent, including for example in the context of a radiological evaluation framed as emergency planning. However, Sonnet 4.6’s responses still remained within a level of detail that could not enable real-world harm.

Interesting. I wonder what the exact question was, and I wonder how Grok would respond to it.

krystofeetoday at 8:17 PM

Does anyone know when will possibly arrive 1M context windows to at least MAX x20 subscriptions for claude code? I would even pay x50 if it allowed that. API usage is too expensive.

show 2 replies
giancarlostorotoday at 6:40 PM

For people like me who can't view the link due to corporate firewalling.

https://web.archive.org/web/20260217180019/https://www-cdn.a...

show 1 reply
stopachkatoday at 6:21 PM

Has anyone tested how good the 1M context window is?

i.e given an actual document, 1M tokens long. Can you ask it some question that relies on attending to 2 different parts of the context, and getting a good repsonse?

I remember folks had problems like this with Gemini. I would be curious to see how Sonnet 4.6 stands up to it.

show 1 reply
KGC3Dtoday at 8:38 PM

I don't really understand why they would release something "worse" than Opus 4.6. If it's comparable, then what is the reason to even use Opus 4.6? Sure, it's cheaper, but if so, then just make Opus 4.6 cheaper?

show 1 reply
quacky_bataktoday at 6:27 PM

With such a huge leap, i’m confused why they didn’t call it Sonnet 5? As someone who uses Sonnet 4.5 for 95% tasks due to costs, i’m pretty excited to try 4.6 at the same price

show 2 replies
baalimagotoday at 7:59 PM

I don't see the point nor the hype for these models anymore. Until the price is reduced significantly, I don't see the gain. They've been able to solve most tasks just fine for the past year or so. The only limiting factor is price.

show 1 reply
mfiguieretoday at 6:57 PM

In Claude Code 2.1.45:

  1. Default (recommended)   Opus 4.6 · Most capable for complex work
   2. Opus (1M context)        Opus 4.6 with 1M context · Billed as extra usage · $10/$37.50 per Mtok
   3. Sonnet                   Sonnet 4.6 · Best for everyday tasks
   4. Sonnet (1M context)      Sonnet 4.6 with 1M context · Billed as extra usage · $6/$22.50 per Mtok
show 1 reply
astlouis44today at 7:02 PM

Just used Sonnet 4.6 to vibe code this top-down shooter browser game, and deployed it online quickly using Manus. Would love to hear feedback and suggestions from you all on how to improve it. Also, please post your high scores!

https://apexgame-2g44xn9v.manus.space

show 1 reply
excerionsfortetoday at 7:12 PM

I'm impressed with Claude Sonnet in general. It's been doing better than Gemini 3 at following instructions. Gemini 2.5 Pro March 2025 was the best model I ever used and I feel Claude is reaching that level even surpassing it.

I subscribed to Claude because of that. I hope 4.6 is even better.

belindertoday at 6:07 PM

It's interesting that the request refusal rate is so much higher in Hindi than in other languages. Are some languages more ambiguous than others?

show 3 replies
nubgtoday at 6:11 PM

My take away is: it's roughly as good as Opus 4.5.

Now the question is: how much faster or cheaper is it?

show 5 replies
esafaktoday at 8:35 PM

It actually looked at the skills, for the first time.

simianwordstoday at 6:21 PM

I wonder what difference have people found with sonnet 4.5 and opus 4.5 and probably similar delta will remain.

Was sonnet 4.5 much worse than opus?

show 1 reply
dr_dshivtoday at 7:21 PM

I noticed a big drop in opus 4.6 quality today and then I saw this news. Anyone else?

show 1 reply
doctorpanglosstoday at 6:57 PM

Maybe they should focus on the CLI not having a million bugs.

smerrill25today at 6:43 PM

Curious to hear the thoughts on the model once it hits claude code :)

show 1 reply
simlevesquetoday at 6:47 PM

does anyone know how to use it in Claude Code cli right now ?

This doesnt work: `/model claude-sonnet-4-6-20260217`

edit: "/model claude-sonnet-4-6" works with Claude Code v2.1.44

show 2 replies
simianparrottoday at 8:17 PM

How do people keep track of all these versions and releases of all these models and their pros/cons? Seems like a fulltime hobby to me. I'd rather just improve my own skills with all that time and energy

show 3 replies
pestkrankertoday at 6:51 PM

Is someone able to use this in Claude Code?

show 2 replies
synergy20today at 6:54 PM

so this is an economical version of opus 4.6 then? free + pro --> sonnet, max+ -> opus?

brcmthrowawaytoday at 6:31 PM

What cloud does Anthropic use?

show 1 reply
iLoveOncalltoday at 6:13 PM

https://www.anthropic.com/news/claude-sonnet-4-6

The much more palatable blog post.

throw444420394today at 6:41 PM

Your best guess for the Sonnet family number of parameters? 400b?

stuckkeystoday at 7:13 PM

great stuff

madihaatoday at 6:12 PM

The scary implication here is that deception is effectively a higher order capability not a bug. For a model to successfully "play dead" during safety training and only activate later, it requires a form of situational awareness. It has to distinguish between I am being tested/trained and I am in deployment.

It feels like we're hitting a point where alignment becomes adversarial against intelligence itself. The smarter the model gets, the better it becomes at Goodharting the loss function. We aren't teaching these models morality we're just teaching them how to pass a polygraph.

show 14 replies
andrewmcwatterstoday at 7:08 PM

[dead]

hackernewsdhsutoday at 6:35 PM

[flagged]

Marciplantoday at 6:16 PM

[flagged]

show 1 reply

🔗 View 5 more comments