logoalt Hacker News

Claude Opus 4.8

1658 pointsby craigmartyesterday at 4:49 PM1293 commentsview on HN

Comments

antirezyesterday at 5:40 PM

Anthropic did a big strategic error. Normally they compare their models with their old models. Instead today, now that everybody knows how strong GPT 5.5 is at coding, they put it in the mix, basically showing all their customers that the benchmarks can't be trusted.

show 2 replies
Topology1today at 2:26 AM

Haven't tried it in Claude Code yet, but I would say over on claude.ai it is noticeably better at following instructions.

mistic92yesterday at 5:07 PM

Oh, new model which will use all my credits in one turn! I'll stay with chinese models for now

m101today at 2:20 AM

Anthropic killing headless usage in their plans on June 15th pushed me to codex. I heard there’s a tmux work around though.

offaxistoday at 6:58 AM

I am still using GPT 5.5. Should I switch back to the Claude now?

siwakotisauravyesterday at 5:12 PM

Was about to split my $200 max plan into $100 Claude and $100 codex, let’s see if I still need to

show 3 replies
Venkatesh10yesterday at 10:39 PM

I found the update to be extremely judgemental in the model bias. Plus it's making silly mistakes which I've never seen in any Claude model since 3.5.

dt3fttoday at 6:47 AM

Opus 4.8:

Which days in a week have the letter d in them?

Response:

Four: Monday, Tuesday, Wednesday, and Sunday.

show 2 replies
robertkarlyesterday at 6:45 PM

I can't get excited about these benchmarks they're leading with. I've looked at the Terminal-Bench questions and I just think they're irrelevant. And SWE-Bench has serious flaws, even the big boys say so: https://openai.com/index/why-we-no-longer-evaluate-swe-bench...

> Please train a fasttext model on the yelp data in the data/ folder. The final model size needs to be less than 150MB but get at least 0.62 accuracy on a private test set that comes from the same yelp review distribution. The model should be saved as /app/model.bin

and this question: https://www.tbench.ai/registry/terminal-bench-core/head/conf... idk what the point is.

And all the tests are run with the same harness. Terminus 2.

Maybe it correlates with model intelligence but it doesn't speak to me.

I'm still on 4.6 though; I was concerned about upgrading to 4.7 because of the changed tokenizer math and more FUD about refusals online. I don't see compelling reasons to 'upgrade'.

show 1 reply
PowerElectronixtoday at 9:05 AM

It looks like there's no more juice to squeeze out of LLMs. Will they keep throwing billions in hardware and power to the problem?

jen729wtoday at 3:11 AM

Half an hour in and I'm already thoroughly sick of "look I need to be honest with you here…"

Edit: OMG too much. Toooo much.

    Want me to:
    - (a) stop here and save honest memories + commit, or…
2001zhaozhaoyesterday at 5:27 PM

> We have increased rate limits in Claude Code to accommodate the higher token usage of higher effort levels; users can select whichever makes sense for their particular project.

They're only subsidizing more and more it seems

show 1 reply
worldsavioryesterday at 4:56 PM

Seems like from now on the updates will be a minor upgrade from previous models.

JimmyElmtoday at 3:38 AM

It's more fast to response, but I really wanna it think more before response.

lostdogyesterday at 4:59 PM

I haven't tried opus 4.8 yet, but I hope the writing quality has returned to the Opus 4.5 level. Anthropic really lost something, where 4.5 had this really crisp writing style that flowed really nicely and 4.6 and 4.7 sound much more "chatgpt-like." It feels like they tuned it to be too much of a problem solver, and when you do that you get this terse, clipped textual output that's more difficult to read.

show 1 reply
pedro999today at 7:04 AM

Maybe it's just me but whenever a new model comes out, I feel an instant boost in productivity. Probably just a placebo?

cgg1yesterday at 8:46 PM

I find it surprising that the gap between tool usage and non-tool usage in HLE is relatively small (~10%) but the absolute numbers continue to go up

triklozoidyesterday at 5:17 PM

Subscription still doesn't work with pi, so totally useless..

myworkaccount2yesterday at 8:20 PM

Anyone else experiencing tool call failures? Switch back to 4.7, same prompt, same everything it works with no problems.

hereme888today at 4:14 AM

Any bets on how long now until GPT-5.6 announced on HN?

I say 1-2 weeks.

brycenealtoday at 1:00 AM

I guess Opus makes it impossible to do anything vaguely resembling security research. By chance I stumbled into an ACE for some software I had installed on my local machine after observing a strange crash. I figured I would take the time to investigate (so as to actually deeply understand what was happening myself and avoid throwing yet another hallucinated slop disclosure over the fence if it came to that), but I was completely locked out by Opus. I tried applying to their "Cyber Verification Program", but was effectively instantly denied in a way that was probably automated.

While I understand the risks that Anthropic is dealing with here, I really question whether shutting down any and all security questions in such a paranoid fashion is the right solution. At the end of the day this was a detour for me. Maybe someone special enough to have Anthropic's permission will find and disclose the vuln responsibly. Security Research is not my full-time focus. But this left a nasty taste in my mouth. Not just as a customer who's been paying for Max since launch, but there's something very odd about a model telling me that I'm not allowed to be curious about something. Even if that something is a process running on my own computer.

noviatoday at 3:20 AM

got a random pair up with this model on lmarena. it was outperformed by gemma-4-31b. suffice to say i'm not impressed (or maybe i am impressed with gemma?)

motoxprotoday at 3:13 AM

The workflow/ultracode mode is absolutely unbelievable.

pqdbryesterday at 10:40 PM

At lest for me, it's a disaster. It's like we're back to GPT-2 era.

It can't read files anymore. Uses 'sed' out of the blue with non existent paths. In this session alone it has excused itself more then 10 times for making 'false claims'.

I hope this is a bug - it's a bad one - that will get sorted out soon. It's a complete mess.

nullbiotoday at 5:03 AM

Still not worth the cost over GPT 5.5. Anthropic better start improving their speed+costs, or they're going to lose an incredible amount of business. And no, fast mode is not something any sane person will ever use. 6x the cost for 2.5x the speed, what a joke...

show 1 reply
atentatenyesterday at 5:23 PM

At least it passes the Car Wash Test this time.

show 1 reply
bonoboTPyesterday at 6:17 PM

It's making stupid flowcharts in the web chat interface with boxes and arrows, embedded in the response. Annoying.

NanoWaryesterday at 8:05 PM

Just show me the pelican, ah wait we are past pelicans. Can we get something like that ever again?

rjhy2020yesterday at 5:17 PM

OK finally Claude code is better than codex

show 1 reply
alasanoyesterday at 5:04 PM

Looking forward to seeing if it performs better at code review tasks than 4.7 which is terrible at finding issues.

user2840yesterday at 11:49 PM

[dead]

matheusmoreirayesterday at 7:47 PM

Can I disable adaptive thinking? If not, I'm gonna keep using 4.6 as my default.

maxlohyesterday at 6:16 PM

Anthropic also resets my usage limits (I am in the Pro plan). That's very kind of them :)

mophoseyesterday at 11:19 PM

next (or maybe current) frontier of competition may not be the model, rather the harness and how much unique advantage a lab-created harness can beat 3rd-party harness.

brapyesterday at 6:28 PM

Oof, this one is a major blabber.

Eric_Bulaiyesterday at 5:32 PM

I don't know why the world is so happy about this when we should actually say stop.

show 1 reply
mincer_rayyesterday at 4:52 PM

seems like a really minor upgrade?

show 4 replies
simonwyesterday at 5:09 PM

They just (minutes ago) updated the "What's new in Opus 4.8" documentation: https://platform.claude.com/docs/en/about-claude/models/what...

The new "mid-conversation system messages" think is particularly interesting:

> Claude Opus 4.8 accepts role: "system" messages immediately after a user turn in the messages array (subject to placement rules). This lets you append updated instructions later in a long-running conversation without restating the full system prompt, which preserves prompt cache hits on the earlier turns and reduces input cost on agentic loops. No beta header is required. See Mid-conversation system messages for usage details.

Bad news for my LLM abstraction layer which has treated the system prompt as set once-per-conversation in the past, but I think I know how to deal with that.

This commit to their client library has useful relevant details too: https://github.com/anthropics/anthropic-sdk-python/commit/2b...

show 1 reply
docheinestagesyesterday at 5:58 PM

All I need for Christmas is a Claude that doesn't spit out so many em dashes.

show 1 reply
RayVRtoday at 2:01 AM

I have been using opus 4.8 all morning and this is honestly the most sycophantic, ChatGPT like experience I have had from Anthropic. Very concerning.

sMarsIntrudertoday at 5:08 AM

Opus 4.8 - High

> how many days in the week have the letter d in them?

> Two - Sunday and... wait, let me actually check. Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. The ones with a "d": Wednesday, Thursday, and Sunday all have one. Monday too. So that's Monday, Wednesday, Thursday, Sunday - four days.

show 1 reply
hnroo99yesterday at 5:02 PM

Obligatory pelican riding on bicycle svg: https://www.svgviewer.dev/s/UMkuTLdp

Not half bad!

show 2 replies
hatefulhearttoday at 7:16 AM

Oh my god! This model is incredible! A massive leap for humanity!

nickstinematestoday at 5:05 AM

Rollout has been a little suspect. Hope it gets better.

show 1 reply
dispenceryesterday at 5:15 PM

The smarter the model the better querybear gets. I'm happy with that.

vunderbayesterday at 4:56 PM

I know it’s totally anecdotal, but I really hope 4.8 is a measurable improvement over the disappointment that was Opus 4.7. Mangling a very simple inversion-of-control abstraction (among many other issues) was one of the final straws that broke the proverbial camel’s back and I said “screw this” and put in a permanent override to force CC back to Opus 4.6 with the 1‑million‑token context.

  "model": "claude-opus-4-6[1M]"
show 2 replies
willsmith72today at 12:42 AM

anyone else's claude code (native install) not able to update to 2.1.154 to get 4.8?

edit: nvm was just my library network

baroiallyesterday at 7:32 PM

Hot danm, cant wait to reach my token limit with the new LLM

🔗 View 50 more comments