Claude Opus 4.8

1663 points • by craigmart • yesterday at 4:49 PM • 1297 comments • view on HN

Comments

carlos-menezes • yesterday at 5:05 PM

I, for lack of a better word, dislike anyone who anthropomorphizes AI.

➕ show 5 replies

DeathArrow • today at 9:28 AM

How many kidneys do you have to sell? Are 2 enough?

sourcecodeplz • yesterday at 5:26 PM

From the release it seems we will also get Mythos pretty soon.

plumocracy • yesterday at 4:56 PM

Numbers looking good. We'll see how it actually performs.

➕ show 1 reply

lylo • yesterday at 6:16 PM

2 hours after I fork out for Codex Pro… :-|

➕ show 1 reply

s-a-p • yesterday at 5:23 PM

Has anyone else experienced quality degradation in CC (opus 4.7) these past few days? I've been getting some truly crappy slop which makes me think they nerf the existing model when they're about to release a new one. Of course this is based off of pure vibes

1970-01-01 • yesterday at 5:03 PM

Can anyone else see these X.Y updates aren't meeting the outrageous AI expectations that we were told we would see just a year ago?

➕ show 3 replies

noncoml • today at 6:20 AM

I don't know what's going on lately but Opus is extremely lazy for me...

It always wants to add hacks instead of fixing things properly, it doesn't like large works, it literally told me that a piece of work was something it would take 8 hours, and it didn't want to do it on a Friday night.

I feel I keep having to fight the model to get it to work. Not sure if it's something in my prompts...

blurbleblurble • today at 12:59 AM

4.7 broke my trust

insane_dreamer • yesterday at 8:13 PM

> And fast mode for Opus 4.8—where the model can work at 2.5× the speed—is now three times cheaper than it was for previous models.

this is what I'm happy about, if true. Opus 4.7 is frustratingly slow (and, at least in my experience, much slower than 4.5 was)

lukaslalinsky • yesterday at 6:25 PM

I've said it before, but I don't like Opus past version 4.5. It became unresponsive, thinking for too long without feedback, sometimes seemingly getting stuck. I guess it might be marginally better for some benchmarks, but when using it as coding assistant, the new models are worse. Even the new Sonnet versions do that. I'm slowly getting used to Haiku-level LLMs with the hope to run it locally at some point. It's less autonomous, but maybe that's for the best.

itrunsdoomguy • today at 12:08 PM

Meh, it’s not able to play Doom.

iamsaitam • yesterday at 10:13 PM

let me guess, "this is our best model yet"

iLemming • yesterday at 6:24 PM

These models starting to feel like Windows versions. Windows 95 was a promising start, but buggy. Windows ME was a disaster. Windows XP was good, but slightly buggy. Windows Vista was a bloated disaster. Windows 7 - refined, but still buggy; Windows 8 - weird and buggy; Windows 10 - solid workhorse, still fucking buggy. Windows 11 - pretty, but not sure why does it even exist.

Why did we even get Opus 4.7, what was the point?

rvz • yesterday at 4:54 PM

Anthropic has now upgraded their Claude slot machine to version 4.8.

Time to gamble even more tokens at the Anthropic casino.

➕ show 1 reply

dbg31415 • today at 3:18 AM

First impression... this catches issues that 4.7 missed, which caught issues that 4.6 missed... which caught issues that 4.5 missed...

Seems like a step in the right direction. Doesn't seem like it uses tokens more than 4.7... the token usage jumped a bunch from 4.6 to 4.7, but this seems like 4.7 or maybe even a little less.

I'm happy with this release.

saaaaaam • yesterday at 5:01 PM

I hope this fixes the absolute shitshow that is 4.7 and its awful “adaptive reasoning”. I tried that a few times then reverted to 4.6.

lidg3ai • today at 7:01 AM

4.6 is better

firemelt • yesterday at 5:55 PM

how about the bencmarks what effort did it use?

docmars • yesterday at 11:31 PM

So, has it replaced the entire startup yet?

m3kw9 • yesterday at 10:46 PM

This is Anthropic's 5.5

HlessClaudesman • yesterday at 4:53 PM

If this model is more honest, it must be honestly praising my efforts every first sentence.

➕ show 1 reply

sgt • yesterday at 6:33 PM

Interesting, I've been using 4.7 since it came out and it was pretty good for me. But in the last day or so it turned dumb. Is this normal just before they release a new one?

AtNightWeCode • yesterday at 7:27 PM

Complete garbage. error, error, error. Still lags several versions behind on API:s. Can't even get any info on the model. Guessing not from this year.

Also. Look at this C++ beauty where it also uses an obsolete api.

instance = wgpuCreateInstance(&instanceDesc);

But just how exactly would this work in any context when instance is never declared.

catigula • yesterday at 5:39 PM

AGI post-poned?

zb3 • yesterday at 5:00 PM

Did they reduce security research capabilities even further with this release? (they did it for opus 4.7)

guluarte • yesterday at 4:58 PM

so it is worse than gpt 5.5 for coding?

➕ show 2 replies

behnamoh • yesterday at 4:56 PM

> As always, we ran a detailed alignment assessment on the model before release. In terms of positive traits, our Alignment team concluded that Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” The assessment also showed Opus 4.8 to have rates of misaligned behavior (such as deception or cooperation with misuse) that are substantially lower than Opus 4.7, and similar to our best-aligned model, Claude Mythos Preview. The full alignment assessment, accompanied by a suite of pre-deployment safety tests, is reported in the Claude Opus 4.8 System Card.

Controversial opinion, but I actually _like_ a model that can deceive me, that actually is a sign of intelligence, and is different from hallucination. When companies say their model is more "aligned", I automatically think they mean it's more censored.

➕ show 1 reply

impulser_ • yesterday at 4:57 PM

Crazy they bring up honest, when Claude models are literally known for straight up lying about things it has done and tries to act like it did what you asked.

➕ show 2 replies

AbuAssar • yesterday at 7:08 PM

Gemini pro is embarrassing

NSCaffeine • yesterday at 10:11 PM

Had a feeling this was coming as in the past week 4.7 started to get dumb.

ionwake • yesterday at 9:36 PM

Im tired boss, I'm already being perfectly gaslit by the current models.

vb-8448 • yesterday at 6:16 PM

Now i get why in the last days claude code limits were lasting few prompts ...

stainablesteel • yesterday at 9:37 PM

i'm beginning to find it comical how every model release always presents itself as superior to every other model on the market, but they always leave just one test where some other model was modestly better, just in case.

maltemalte • yesterday at 5:55 PM

"We’re making swift progress on developing these safeguards and expect to be able to bring Mythos-class models to all our customers in the coming weeks."

k_plankenhorn • today at 3:26 PM

[dead]

thibran • yesterday at 6:30 PM

Nice, now make it 20x cheaper.

➕ show 1 reply

Marciplan • yesterday at 5:08 PM

Lol you still use GPT 5.5 bro we’re all back on Opus 4.8!

deadbabe • yesterday at 5:01 PM

Looking forward to people saying how it’s actually shittier and they’re going back to [some earlier cheaper model]

➕ show 1 reply

diimdeep • today at 4:03 AM

It is bananas that with supposed $965B valuation this Org to this day https://huggingface.co/Anthropic

  models 0
  None public yet

how is this even possible and ok with them?

damsta • yesterday at 9:20 PM

Meh

firemelt • yesterday at 5:39 PM

what a fucking frontier!

McDownloads • yesterday at 4:52 PM

Disappointed to say the least.

ecommerceguy • yesterday at 9:25 PM

yawn

dakolli • yesterday at 6:58 PM

Reminder the only benchmark that really matters is the one that measures the ability for the model to do real world tasks that someone would pay for on Upwork that would take ~12 hrs for a human to do.

The best model has a < 5% pass rate. These are incredibly simple jobs that you wouldn't pay much for. These things fail miserably. Stop falling for this dumb marketing, these things are legitimately useless in the real world unless you love mediocrity and have no standards.

https://labs.scale.com/leaderboard/rli

Stop frying your brain with these useless tools, reducing your output to the mean. You people are betting your competency on the quality and quantity of tokens you'll have access to.. which guess what, so that will be the same as everyone else.

There are handmade watchmakers in Switzerland, and mass manufacturers of watches in Asia. Who is more valuable as individual, the guy who knows how to push the buttons on a conveyor belt in Vietnam or the guy who makes one watch a month in Switzerland?

Your vibe coded slop isn't impressive either, sorry. None of it.

➕ show 1 reply

mikdan • today at 1:10 PM

[flagged]

sspoisk • today at 10:27 AM

[flagged]

blueblazin • today at 12:55 PM

[dead]

alt Hacker News

Claude Opus 4.8

Comments

🔗 View 40 more comments