Claude Opus 4.8

1658 points • by craigmart • yesterday at 4:49 PM • 1293 comments • view on HN

Comments

Anthropic did a big strategic error. Normally they compare their models with their old models. Instead today, now that everybody knows how strong GPT 5.5 is at coding, they put it in the mix, basically showing all their customers that the benchmarks can't be trusted.

➕ show 2 replies

Topology1 • today at 2:26 AM

Haven't tried it in Claude Code yet, but I would say over on claude.ai it is noticeably better at following instructions.

mistic92 • yesterday at 5:07 PM

Oh, new model which will use all my credits in one turn! I'll stay with chinese models for now

m101 • today at 2:20 AM

Anthropic killing headless usage in their plans on June 15th pushed me to codex. I heard there’s a tmux work around though.

offaxis • today at 6:58 AM

I am still using GPT 5.5. Should I switch back to the Claude now?

siwakotisaurav • yesterday at 5:12 PM

Was about to split my $200 max plan into $100 Claude and $100 codex, let’s see if I still need to

➕ show 3 replies

Venkatesh10 • yesterday at 10:39 PM

I found the update to be extremely judgemental in the model bias. Plus it's making silly mistakes which I've never seen in any Claude model since 3.5.

dt3ft • today at 6:47 AM

Opus 4.8:

Which days in a week have the letter d in them?

Response:

Four: Monday, Tuesday, Wednesday, and Sunday.

➕ show 2 replies

robertkarl • yesterday at 6:45 PM

I can't get excited about these benchmarks they're leading with. I've looked at the Terminal-Bench questions and I just think they're irrelevant. And SWE-Bench has serious flaws, even the big boys say so: https://openai.com/index/why-we-no-longer-evaluate-swe-bench...

> Please train a fasttext model on the yelp data in the data/ folder. The final model size needs to be less than 150MB but get at least 0.62 accuracy on a private test set that comes from the same yelp review distribution. The model should be saved as /app/model.bin

and this question: https://www.tbench.ai/registry/terminal-bench-core/head/conf... idk what the point is.

And all the tests are run with the same harness. Terminus 2.

Maybe it correlates with model intelligence but it doesn't speak to me.

I'm still on 4.6 though; I was concerned about upgrading to 4.7 because of the changed tokenizer math and more FUD about refusals online. I don't see compelling reasons to 'upgrade'.

➕ show 1 reply

PowerElectronix • today at 9:05 AM

It looks like there's no more juice to squeeze out of LLMs. Will they keep throwing billions in hardware and power to the problem?

jen729w • today at 3:11 AM

Half an hour in and I'm already thoroughly sick of "look I need to be honest with you here…"

Edit: OMG too much. Toooo much.

    Want me to:
    - (a) stop here and save honest memories + commit, or…

2001zhaozhao • yesterday at 5:27 PM

> We have increased rate limits in Claude Code to accommodate the higher token usage of higher effort levels; users can select whichever makes sense for their particular project.

They're only subsidizing more and more it seems

➕ show 1 reply

worldsavior • yesterday at 4:56 PM

Seems like from now on the updates will be a minor upgrade from previous models.

JimmyElm • today at 3:38 AM

It's more fast to response, but I really wanna it think more before response.

lostdog • yesterday at 4:59 PM

I haven't tried opus 4.8 yet, but I hope the writing quality has returned to the Opus 4.5 level. Anthropic really lost something, where 4.5 had this really crisp writing style that flowed really nicely and 4.6 and 4.7 sound much more "chatgpt-like." It feels like they tuned it to be too much of a problem solver, and when you do that you get this terse, clipped textual output that's more difficult to read.

➕ show 1 reply

pedro999 • today at 7:04 AM

Maybe it's just me but whenever a new model comes out, I feel an instant boost in productivity. Probably just a placebo?

cgg1 • yesterday at 8:46 PM

I find it surprising that the gap between tool usage and non-tool usage in HLE is relatively small (~10%) but the absolute numbers continue to go up

triklozoid • yesterday at 5:17 PM

Subscription still doesn't work with pi, so totally useless..

myworkaccount2 • yesterday at 8:20 PM

Anyone else experiencing tool call failures? Switch back to 4.7, same prompt, same everything it works with no problems.

hereme888 • today at 4:14 AM

Any bets on how long now until GPT-5.6 announced on HN?

I say 1-2 weeks.

bryceneal • today at 1:00 AM

I guess Opus makes it impossible to do anything vaguely resembling security research. By chance I stumbled into an ACE for some software I had installed on my local machine after observing a strange crash. I figured I would take the time to investigate (so as to actually deeply understand what was happening myself and avoid throwing yet another hallucinated slop disclosure over the fence if it came to that), but I was completely locked out by Opus. I tried applying to their "Cyber Verification Program", but was effectively instantly denied in a way that was probably automated.

While I understand the risks that Anthropic is dealing with here, I really question whether shutting down any and all security questions in such a paranoid fashion is the right solution. At the end of the day this was a detour for me. Maybe someone special enough to have Anthropic's permission will find and disclose the vuln responsibly. Security Research is not my full-time focus. But this left a nasty taste in my mouth. Not just as a customer who's been paying for Max since launch, but there's something very odd about a model telling me that I'm not allowed to be curious about something. Even if that something is a process running on my own computer.

novia • today at 3:20 AM

got a random pair up with this model on lmarena. it was outperformed by gemma-4-31b. suffice to say i'm not impressed (or maybe i am impressed with gemma?)

motoxpro • today at 3:13 AM

The workflow/ultracode mode is absolutely unbelievable.

pqdbr • yesterday at 10:40 PM

At lest for me, it's a disaster. It's like we're back to GPT-2 era.

It can't read files anymore. Uses 'sed' out of the blue with non existent paths. In this session alone it has excused itself more then 10 times for making 'false claims'.

I hope this is a bug - it's a bad one - that will get sorted out soon. It's a complete mess.

nullbio • today at 5:03 AM

Still not worth the cost over GPT 5.5. Anthropic better start improving their speed+costs, or they're going to lose an incredible amount of business. And no, fast mode is not something any sane person will ever use. 6x the cost for 2.5x the speed, what a joke...

➕ show 1 reply

atentaten • yesterday at 5:23 PM

At least it passes the Car Wash Test this time.

➕ show 1 reply

bonoboTP • yesterday at 6:17 PM

It's making stupid flowcharts in the web chat interface with boxes and arrows, embedded in the response. Annoying.

NanoWar • yesterday at 8:05 PM

Just show me the pelican, ah wait we are past pelicans. Can we get something like that ever again?

rjhy2020 • yesterday at 5:17 PM

OK finally Claude code is better than codex

➕ show 1 reply

alasano • yesterday at 5:04 PM

Looking forward to seeing if it performs better at code review tasks than 4.7 which is terrible at finding issues.

user2840 • yesterday at 11:49 PM

[dead]

matheusmoreira • yesterday at 7:47 PM

Can I disable adaptive thinking? If not, I'm gonna keep using 4.6 as my default.

maxloh • yesterday at 6:16 PM

Anthropic also resets my usage limits (I am in the Pro plan). That's very kind of them :)

mophose • yesterday at 11:19 PM

next (or maybe current) frontier of competition may not be the model, rather the harness and how much unique advantage a lab-created harness can beat 3rd-party harness.

brap • yesterday at 6:28 PM

Oof, this one is a major blabber.

Eric_Bulai • yesterday at 5:32 PM

I don't know why the world is so happy about this when we should actually say stop.

➕ show 1 reply

mincer_ray • yesterday at 4:52 PM

seems like a really minor upgrade?

➕ show 4 replies

simonw • yesterday at 5:09 PM

They just (minutes ago) updated the "What's new in Opus 4.8" documentation: https://platform.claude.com/docs/en/about-claude/models/what...

The new "mid-conversation system messages" think is particularly interesting:

> Claude Opus 4.8 accepts role: "system" messages immediately after a user turn in the messages array (subject to placement rules). This lets you append updated instructions later in a long-running conversation without restating the full system prompt, which preserves prompt cache hits on the earlier turns and reduces input cost on agentic loops. No beta header is required. See Mid-conversation system messages for usage details.

Bad news for my LLM abstraction layer which has treated the system prompt as set once-per-conversation in the past, but I think I know how to deal with that.

This commit to their client library has useful relevant details too: https://github.com/anthropics/anthropic-sdk-python/commit/2b...

➕ show 1 reply

docheinestages • yesterday at 5:58 PM

All I need for Christmas is a Claude that doesn't spit out so many em dashes.

➕ show 1 reply

RayVR • today at 2:01 AM

I have been using opus 4.8 all morning and this is honestly the most sycophantic, ChatGPT like experience I have had from Anthropic. Very concerning.

sMarsIntruder • today at 5:08 AM

Opus 4.8 - High

> how many days in the week have the letter d in them?

> Two - Sunday and... wait, let me actually check. Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. The ones with a "d": Wednesday, Thursday, and Sunday all have one. Monday too. So that's Monday, Wednesday, Thursday, Sunday - four days.

➕ show 1 reply

hnroo99 • yesterday at 5:02 PM

Obligatory pelican riding on bicycle svg: https://www.svgviewer.dev/s/UMkuTLdp

Not half bad!

➕ show 2 replies

hatefulheart • today at 7:16 AM

Oh my god! This model is incredible! A massive leap for humanity!

nickstinemates • today at 5:05 AM

Rollout has been a little suspect. Hope it gets better.

➕ show 1 reply

dispencer • yesterday at 5:15 PM

The smarter the model the better querybear gets. I'm happy with that.

vunderba • yesterday at 4:56 PM

I know it’s totally anecdotal, but I really hope 4.8 is a measurable improvement over the disappointment that was Opus 4.7. Mangling a very simple inversion-of-control abstraction (among many other issues) was one of the final straws that broke the proverbial camel’s back and I said “screw this” and put in a permanent override to force CC back to Opus 4.6 with the 1‑million‑token context.

  "model": "claude-opus-4-6[1M]"

➕ show 2 replies

willsmith72 • today at 12:42 AM

anyone else's claude code (native install) not able to update to 2.1.154 to get 4.8?

edit: nvm was just my library network

baroiall • yesterday at 7:32 PM

Hot danm, cant wait to reach my token limit with the new LLM

alt Hacker News

Claude Opus 4.8

Comments

🔗 View 50 more comments