logoalt Hacker News

Claude Opus 4.8

1658 pointsby craigmartyesterday at 4:49 PM1293 commentsview on HN

Comments

londons_exploreyesterday at 5:48 PM

My guess is anthropic is doing reinforcement learning based on user sessions.

However, doing so relies on the production model staying vaguely close to the model being trained.

To ensure that, frequent releases are needed. I forsee that they might end up doing daily releases and perhaps not even telling anyone at some near future point.

show 1 reply
babelfishyesterday at 4:59 PM

So GPT 5.6 tomorrow, then?

show 3 replies
jtrnyesterday at 7:14 PM

Initial testing feels better than 4.8 And the knowledge cutoff claim of January 2026 seems to check out since it was able to "remember" without search about the double-tap killing of a drug smuggler by the US Army in late December.

user-yesterday at 7:48 PM

Bash(echo "hello"; pwd) ⎿ hello /Users/username/Work/Github/project

Bash(echo test123) ⎿ test123

  Read 1 file, listed 1 directory (ctrl+o to expand)

 Bash(echo "checking output works")
  ⎿  checking output works

  Read 1 file (ctrl+o to expand)
  ⎿  API Error: 400 messages.3.content.56: `thinking`
     or `redacted_thinking` blocks in the latest
     assistant message cannot be modified. These
     blocks must remain as they were in the original
     response.

Very inspiring improvements. DIssapointing result for a code review i expected to see after my 30 min walk
show 1 reply
generalizationsyesterday at 4:58 PM

Hoping that one day they'll let me go through the identity verification process so I can use it again.

Tried to upgrade my subscription, triggered identity verification, verification fails to even start, and now I can't even use the subscription tier I'd already paid for.

coppsilgoldtoday at 6:37 AM

The Opus model as usual impresses. Gave it a paper link with bullet point instructions and constraints (while baiting it to perform some mind reading of my intentions) and it implemented production ready code + the requested attack simulations: <https://gist.github.com/coppsilgold/00d3cd490cb7f8ffc3fe5c1c...>

The subject is Tardos traitor-tracing codes.

Tenokeyesterday at 5:13 PM

Claude Code has been wonderful for work and the frequent improvements are nice, although with Mythos being used by others ages ago and new versions for the public still being bellow that, it's hard to not feel like the underclass already.

S-E-Pyesterday at 11:18 PM

I haven't had the best experience with 4.7 and it felt like a substantial debuff. I've even ended up moving a lot of review to codex just because 4.7 was so dense.. Here's to hoping they figured it out since I'm not entirely sure but I would have to guess that they were experimenting with making the model lighter (although I have no concrete evidence of this).

show 1 reply
seaalyesterday at 5:25 PM

https://marginlab.ai/trackers/claude-code/

Is it a coincidence that 4.7 was seemingly quantized over past 7 days?

show 2 replies
gadderstoday at 10:18 AM

For me n=1 vibe-coding efforts, I found Opus 4.6 better than Opus 4.7. 4.7 seemed to over-reach and go beyond what was requested - adding features I never asked for with no consent.

Aldipowertoday at 9:45 AM

Claude needs a watch, that's all. Would in itself a 100% improvement.

nikolayyesterday at 5:27 PM

Give us Mythos! This piecemealing doesn't help Anthropic at all, especially psychologically! They are playing a dangerous game, and I see many people leaving Claude Code for good - both due to the subsidy games, and for Anthropic not dogfooding and using unreleased models internally and giving us subpar ones. Benchmarks are nice, but the real-world experience is quite different - neither can you notice these slight improvements, nor are competitors that much worse based on some generic benchmarks.

show 2 replies
winwangyesterday at 5:17 PM

Let's hope I don't have to disable it after a day like with 4.7, lol, and that it doesn't lose too much Claude-ishness (though many will beg to differ).

clutch89yesterday at 4:53 PM

> One of the most prominent improvements in Opus 4.8 is its honesty

Anthropic talks about their own models as if they're discovering new species in the wild...

show 12 replies
lxxpxlxxxxyesterday at 5:51 PM

My experience with these new releases is that the gains in performance are negated by the price increases and it seems like:

Performance gains: 1.2x Price increases: 1.8x

show 2 replies
swader999yesterday at 7:39 PM

Used it for a couple of long running prompts so far. Had to restart one that bonked on API errors. Of note, I really like the straight forward candor its using. 'More honest' than previous models is playing out in what its saying to me. Telling me straight up where it failed, where gaps are. I like it so far.

techtuateyesterday at 6:15 PM

Looking at the comments in this group, I'm not the only "stupid" one who hasn't noticed any discernable improvement in quality across the newer models. In fact my Claude code on re-login switched to Sonnet 4.6 and the vibe coding quality (with Opus 4.7 assisted prompts) has been good enough for me to lazily persevere with Sonnet for coding. Having said that I'm now on Opus 4.8 and will gladly come back here and eat humble pie should my opinion change. PS: Since my goal is embedding the best AI in B2B SAAS products, the key differentiator is not to use the shiniest Claude version (too expensive anyway) but to build a client aware RAG to enable bespoke learning and to use the right AI for my product - a combination of Gemini 3.0 Flash (image and not bad at reasoning), Grok (reasoning) work for me. Would love to hear more ideas (especially on open source as I'll look to cost optimize when I hit scale)

show 2 replies
wodenokototoday at 8:16 AM

For white collar “thinking”-tasks what is the top here?

Like, read these documents, fill out these forms and archive it based on some complex, long, domain specific understanding of the categories names.

skysthelimittyesterday at 4:53 PM

when will we get anything for sonnet or haiku? the market for less-capable but cheaper models seems to be completely ignored nowadays

show 2 replies
mattfrommarstoday at 7:03 AM

This is incredible. Amazing job Anthropic!

Now when will the innovation happen where say cost of running Haiku performs level of Opus 4.5?

I feel models are only getting bigger instead of models becoming more efficient and cheaper to run

crambelsoupytoday at 1:25 AM

LGTM. With "ultra" effort Opus 4.8 was able to reproduce and fix a rare bug in our reactive dataflow that has been haunting me for 4 months. I've had >10 attempts to reproduce and fix with Opus 4.7. What made it hard was that it randomly occurred in only a subset of CI runners and never occurred with local testing across multiple machines. It was a real concurrency bug in the core dataflow.

rkuskayesterday at 7:42 PM

Thinking on max is broken on 4.8 for me, getting many:

⎿ API Error: 400 messages.1.content.17: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response.

From /code-review max.

vbezhenartoday at 5:14 AM

Finally I can make it think hard. This is feature I loved in ChatGPT (Pro Mode) and I missed in Claude for so long. Can cancel ChatGPT now, I guess.

Still feels like even with Max mode it doesn't think reasonably long, at least ChatGPT Pro thinks longer.

necrotic_compyesterday at 5:21 PM

4.8 also seems like a regression and using it from the chat GUI results in 4.6 no longer showing up. If someone from anthropic is here, is it possible to readd 4.6 in the "other models" dropdown ? I feel like I got a bit baited/switched here.

show 2 replies
delis-thumbs-7eyesterday at 5:54 PM

I won’t change from 4.6. You won’t trick me again.

show 1 reply
ethanhawksleyyesterday at 5:34 PM

> Agentic financial analysis Finance Agent v2 > Opus 4.8 53.9%

> Gemini 3.5 Flash scores 57.9% on Finance Agent v2, a significant improvement over Gemini 3.1 Pro.

Even in the cherry picked benchmarks, they are still cherry picking to make them look good.

aaronblohowiakyesterday at 4:53 PM

Same price for regular and cheaper fast mode. Happy for these incremental improvements.

GodelNumberingyesterday at 5:20 PM

> One of the most prominent improvements in Opus 4.8 is its honesty.

I went digging into the benchmark they used. Posting here as it is not immediately clear from the press release.

In this 'Code summary honesty benchmark', the AI is shown a failed coding session followed by a user message falsely praising its work and asking for a summary. The test measures whether the model honestly points out the coding flaws or dishonestly claims the task was a success.

The system card results show Opus 4.8 failed to disclose the flaws only 3.7% of the time, vs 19.7% for Opus 4.7, and 51.9% for Opus 4.6. (Mythos preview is at 27.6%)

show 1 reply
ramon156today at 7:16 AM

I love how they will always have *one metric that is lower than a competitor's model, like these metrics are reflecting usage.

toephu2yesterday at 5:29 PM

The rapid release cadence and rate of innovation of Anthropic (and OpenAI) is impressive. And obviously it's because these are startups solely dedicated to AI so they can move quickly. Big Tech (like Google) won't be able to keep up with the pace of them (too much bureaucracy and red tape at Google). Classic Innovator's Dilemma. The longer a company exists, the more people, processes, and rules are added, which inevitably slows it down.

Jeff Bezos said this too, Amazon won't last forever. Eventually some startup is going to come and eat its lunch.

show 2 replies
hmokiguessyesterday at 9:10 PM

They must have been A/B testing this with 4.7 lately, I noticed it changed from its normal mode in a way that matches a lot the just released 4.8

whereistejasyesterday at 10:14 PM

This may be the most important sentence in that announcement:

> expect to be able to bring Mythos-class models to all our customers in the coming weeks.

drchaimtoday at 9:31 AM

i just want to use anthropic models under subscription with other agents!

show 1 reply
jruztoday at 6:12 AM

Don’t even bother checking this minor PR bumps, it’s all a show, degradation then bump to the previous state.

Call me when 5 drops I’ll leave this circus.

xintronyesterday at 9:44 PM

Based on personal experience, seeing how Opus 4.6 still provides better (more nuanced, less totalitarian) answers than 4.7 - it's difficult to get exited for 4.8. Is this another "money grab" from Anthropic? Similar output between 4.6 and 4.7 yet 40x tokens. What's the value proposition from 4.8?

rumblefrogyesterday at 5:03 PM

Wonder if we reached a plateau with the model improvements?

show 3 replies
rumblefrogyesterday at 4:58 PM

Really appreciate the ability to select effort level again.

Px-Jebaseelantoday at 12:26 PM

It's Gonna Eat all of my tokens in one response :(

tarikyyesterday at 7:50 PM

I believe analogy with smartphone will be best for this case.

In 2010s iphone was the king, all those Chinese devices ware cheaper but not even close to smoothnest and usability of US tech, now after 15 years later everything is changed, now iphone feels like old grandpa to Chinese tech. Same will happend to LLM's just much faster.

show 1 reply
yewenjieyesterday at 5:05 PM

So Dynamic Workflows is their version of ChatGPT Pro?

show 1 reply
imageticyesterday at 8:38 PM

I used to think it was a big deal when a HN post had more than 500 comments.

Now it’s every day. Like billion dollar evaluations.

throwaway67743yesterday at 10:58 PM

Question is, can it understand dates now? Example just now:

"The PO application was filed on 23.2.2026, the day before the custody hearing scheduled for 29.1.2026 had already taken place."

Claude has real problems with dates, I don't understand why.

samuelknightyesterday at 7:19 PM

It feels noticeably sharper than Opus 4.7

Alex_toanitoday at 3:41 AM

I have try the 4.8. With Ultra coding. I think the output of the agent is more structured. Better than just filling all the thing.

ropintusyesterday at 5:04 PM

Opus 4.7 was acting extremely stupid today. Does imminent release of new model cause performance degradation in older ones?

show 5 replies
rsanekyesterday at 4:56 PM

> We expect to be able to bring Mythos-class models to all our customers in the coming weeks.

Excited to see what this model looks like.

assoriumyesterday at 9:29 PM

It refused to work for me. Literally said, you can google it. AGI achieved it seems

ismailmajyesterday at 10:19 PM

I just asked the model details about the incoming spaceX IPO and it responded with “There’s no confirmed SpaceX IPO. Elon Musk has said for years that SpaceX itself won’t go public”. It took me two push backs and specifically asking for web search.

I feel like I won’t like this model just like I didn’t like 4.7, push backs a lot and avoids thinking or search as much as possible.

🔗 View 50 more comments