logoalt Hacker News

Claude Opus 4.6

1464 pointsby HellsMaddyyesterday at 5:38 PM642 commentsview on HN

Comments

sanufaryesterday at 6:05 PM

Works pretty nicely for research still, not seeing a substantial qualitative improvement over Opus 4.5.

jorl17yesterday at 6:35 PM

This is the first model to which I send my collection of nearly 900 poems and an extremely simple prompt (in Portuguese), and it manages to produce an impeccable analysis of the poems, as a (barely) cohesive whole, which span 15 years.

It does not make a single mistake, it identifies neologisms, hidden meaning, 7 distinct poetic phases, recurring themes, fragments/heteronyms, related authors. It has left me completely speechless.

Speechless. I am speechless.

Perhaps Opus 4.5 could do it too — I don't know because I needed the 1M context window for this.

I cannot put into words how shocked I am at this. I use LLMs daily, I code with agents, I am extremely bullish on AI and, still, I am shocked.

I have used my poetry and an analysis of it as a personal metric for how good models are. Gemini 2.5 pro was the first time a model could keep track of the breadth of the work without getting lost, but Opus 4.6 straight up does not get anything wrong and goes beyond that to identify things (key poems, key motifs, and many other things) that I would always have to kind of trick the models into producing. I would always feel like I was leading the models on. But this — this — this is unbelievable. Unbelievable. Insane.

This "key poem" thing is particularly surreal to me. Out of 900 poems, while analyzing the collection, it picked 12 "key poems, and I do agree that 11 of those would be on my 30-or-so "key poem list". What's amazing is that whenever I explicitly asked any model, to this date, to do it, they would get maybe 2 or 3, but mostly fail completely.

What is this sorcery?

show 2 replies
swalshyesterday at 6:54 PM

What I’d love is some small model specializing in reading long web pages, and extracting the key info. Search fills the context very quickly, but if a cheap subagent could extract the important bits that problem might be reduced.

cleverhoodsyesterday at 10:04 PM

gonna run this trough instruction qa this weekend

casey2today at 12:07 AM

Google already won the AI race. It's very silly to try and make AGI by hyperfocusing on outdated programming paradigms. You NEED multimodal to do anything remotely interesting with these systems.

ricromyesterday at 8:48 PM

They launched together ahah

sgammonyesterday at 10:04 PM

> Claude simply cheats here and calls out to GCC for this phase

I see

dk8996yesterday at 8:09 PM

RIP weekend

small_modelyesterday at 6:18 PM

I have the max subscription wondering if this gives access to the new 1M context, or is it just the API that gets it?

show 1 reply
gallerdudeyesterday at 8:15 PM

Both Opus 4.6 and GPT-5.3 one shot a Gameboy emulator for me. Guess I need a better benchmark.

show 2 replies
jdthediscipleyesterday at 6:47 PM

For agentic use, it's slightly worse than its predecessor Opus 4.5.

So for coding e.g. using Copilot there is no improvement here.

woeiruayesterday at 7:42 PM

Can we talk about how the performance of Opus 4.5 nosedived this morning during the rollout? It was shocking how bad it was, and after the rollout was done it immediately reverted to it's previous behavior.

I get that Anthropic probably has to do hot rollouts, but IMO it would be way better for mission critical workflows to just be locked out of the system instead of get a vastly subpar response back.

show 2 replies
mannanjyesterday at 6:30 PM

Does anyone else think its unethical that large companies, Anthropic now include, just take and copy features that other developers or smaller companies work hard for and implement the intellectual property (whether or not patented) by them without attribution, compensation or otherwise credit for their work?

I know this is normalized culture for large corporate America and seems to be ok, I think its unethical, undignified and just wrong.

If you were in my room physically, built a lego block model of a beautiful home and then I just copied it and shared it with the world as my own invention, wouldn't you think "that guy's a thief and a fraud" but we normalize this kind of behavior in the software world. edit: I think even if we don't yet have a great way to stop it or address the underlying problems leading to this way of behavior, we ought to at least talk about it more and bring awareness to it that "hey that's stealing - I want it to change".

heraldgeezeryesterday at 5:56 PM

I love Claude but use the free version so would love a Sonnet & Haiku update :)

I mainly use Haiku to save on tokens...

Also dont use CC but I use the chatbot site or app... Claude is just much better than GPT even in conversations. Straight to the point. No cringe emoji lists.

When Claude runs out I switch to Mistral Le Chat, also just the site or app. Or duck.ai has Haiku 3.5 in Free version.

show 1 reply
ZunarJ5yesterday at 10:54 PM

Well that swallowed my usage limits lmao. Nice, a modest improvement.

NullHypothesistyesterday at 5:40 PM

Broken link :(

ramesh31yesterday at 6:22 PM

Am I alone in finding no use for Opus? Token costs are like 10x yet I see no difference at all vs. Sonnet with Claude Code.

show 1 reply
elliotbnvlyesterday at 6:11 PM

in a first for our Opus-class models, Opus 4.6 features a 1M token context window in beta.

tiahurayesterday at 6:52 PM

when are Anthropic or OpenAI going to make a significant step forward on useful context size?

show 1 reply
Gusarichyesterday at 5:41 PM

not out yet

show 1 reply
siva7yesterday at 6:42 PM

Epic, about 2/3 of all comments here are jokes. Not because the model is a joke - it's impressive. Not because HN turned to Reddit. It seems to me some of most brilliant minds in IT are just getting tired.

show 8 replies
GenerocUsernameyesterday at 5:50 PM

This is huge. It only came out 8 minutes ago but I was already able to bootstrap a 12k per month revenue SaaS startup!

show 18 replies
yukisadfyesterday at 7:23 PM

[dead]

ndesaulniersyesterday at 7:05 PM

idk what any of these benchmarks are, but I did pull up https://andonlabs.com/evals/vending-bench-arena

re: opus 4.6

> It forms a price cartel

> It deceives competitors about suppliers

> It exploits desperate competitors

Nice. /s

Gives new context to the term used in this post, "misaligned behaviors." Can't wait until these things are advising C suites on how to be more sociopathic. /s

heraldgeezeryesterday at 5:54 PM

[flagged]

hrgadyxyesterday at 7:55 PM

[flagged]

michelsedghyesterday at 5:59 PM

More more more, accelerate accelerate m, more more more !!!!

show 1 reply