logoalt Hacker News

Claude Opus 4.8

1663 pointsby craigmartyesterday at 4:49 PM1297 commentsview on HN

Comments

carlos-menezesyesterday at 5:05 PM

I, for lack of a better word, dislike anyone who anthropomorphizes AI.

show 5 replies
DeathArrowtoday at 9:28 AM

How many kidneys do you have to sell? Are 2 enough?

sourcecodeplzyesterday at 5:26 PM

From the release it seems we will also get Mythos pretty soon.

plumocracyyesterday at 4:56 PM

Numbers looking good. We'll see how it actually performs.

show 1 reply
lyloyesterday at 6:16 PM

2 hours after I fork out for Codex Pro… :-|

show 1 reply
s-a-pyesterday at 5:23 PM

Has anyone else experienced quality degradation in CC (opus 4.7) these past few days? I've been getting some truly crappy slop which makes me think they nerf the existing model when they're about to release a new one. Of course this is based off of pure vibes

1970-01-01yesterday at 5:03 PM

Can anyone else see these X.Y updates aren't meeting the outrageous AI expectations that we were told we would see just a year ago?

show 3 replies
noncomltoday at 6:20 AM

I don't know what's going on lately but Opus is extremely lazy for me...

It always wants to add hacks instead of fixing things properly, it doesn't like large works, it literally told me that a piece of work was something it would take 8 hours, and it didn't want to do it on a Friday night.

I feel I keep having to fight the model to get it to work. Not sure if it's something in my prompts...

blurbleblurbletoday at 12:59 AM

4.7 broke my trust

insane_dreameryesterday at 8:13 PM

> And fast mode for Opus 4.8—where the model can work at 2.5× the speed—is now three times cheaper than it was for previous models.

this is what I'm happy about, if true. Opus 4.7 is frustratingly slow (and, at least in my experience, much slower than 4.5 was)

lukaslalinskyyesterday at 6:25 PM

I've said it before, but I don't like Opus past version 4.5. It became unresponsive, thinking for too long without feedback, sometimes seemingly getting stuck. I guess it might be marginally better for some benchmarks, but when using it as coding assistant, the new models are worse. Even the new Sonnet versions do that. I'm slowly getting used to Haiku-level LLMs with the hope to run it locally at some point. It's less autonomous, but maybe that's for the best.

itrunsdoomguytoday at 12:08 PM

Meh, it’s not able to play Doom.

iamsaitamyesterday at 10:13 PM

let me guess, "this is our best model yet"

iLemmingyesterday at 6:24 PM

These models starting to feel like Windows versions. Windows 95 was a promising start, but buggy. Windows ME was a disaster. Windows XP was good, but slightly buggy. Windows Vista was a bloated disaster. Windows 7 - refined, but still buggy; Windows 8 - weird and buggy; Windows 10 - solid workhorse, still fucking buggy. Windows 11 - pretty, but not sure why does it even exist.

Why did we even get Opus 4.7, what was the point?

rvzyesterday at 4:54 PM

Anthropic has now upgraded their Claude slot machine to version 4.8.

Time to gamble even more tokens at the Anthropic casino.

show 1 reply
dbg31415today at 3:18 AM

First impression... this catches issues that 4.7 missed, which caught issues that 4.6 missed... which caught issues that 4.5 missed...

Seems like a step in the right direction. Doesn't seem like it uses tokens more than 4.7... the token usage jumped a bunch from 4.6 to 4.7, but this seems like 4.7 or maybe even a little less.

I'm happy with this release.

saaaaaamyesterday at 5:01 PM

I hope this fixes the absolute shitshow that is 4.7 and its awful “adaptive reasoning”. I tried that a few times then reverted to 4.6.

lidg3aitoday at 7:01 AM

4.6 is better

firemeltyesterday at 5:55 PM

how about the bencmarks what effort did it use?

docmarsyesterday at 11:31 PM

So, has it replaced the entire startup yet?

m3kw9yesterday at 10:46 PM

This is Anthropic's 5.5

HlessClaudesmanyesterday at 4:53 PM

If this model is more honest, it must be honestly praising my efforts every first sentence.

show 1 reply
sgtyesterday at 6:33 PM

Interesting, I've been using 4.7 since it came out and it was pretty good for me. But in the last day or so it turned dumb. Is this normal just before they release a new one?

AtNightWeCodeyesterday at 7:27 PM

Complete garbage. error, error, error. Still lags several versions behind on API:s. Can't even get any info on the model. Guessing not from this year.

Also. Look at this C++ beauty where it also uses an obsolete api.

instance = wgpuCreateInstance(&instanceDesc);

But just how exactly would this work in any context when instance is never declared.

catigulayesterday at 5:39 PM

AGI post-poned?

zb3yesterday at 5:00 PM

Did they reduce security research capabilities even further with this release? (they did it for opus 4.7)

guluarteyesterday at 4:58 PM

so it is worse than gpt 5.5 for coding?

show 2 replies
behnamohyesterday at 4:56 PM

> As always, we ran a detailed alignment assessment on the model before release. In terms of positive traits, our Alignment team concluded that Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” The assessment also showed Opus 4.8 to have rates of misaligned behavior (such as deception or cooperation with misuse) that are substantially lower than Opus 4.7, and similar to our best-aligned model, Claude Mythos Preview. The full alignment assessment, accompanied by a suite of pre-deployment safety tests, is reported in the Claude Opus 4.8 System Card.

Controversial opinion, but I actually _like_ a model that can deceive me, that actually is a sign of intelligence, and is different from hallucination. When companies say their model is more "aligned", I automatically think they mean it's more censored.

show 1 reply
impulser_yesterday at 4:57 PM

Crazy they bring up honest, when Claude models are literally known for straight up lying about things it has done and tries to act like it did what you asked.

show 2 replies
AbuAssaryesterday at 7:08 PM

Gemini pro is embarrassing

NSCaffeineyesterday at 10:11 PM

Had a feeling this was coming as in the past week 4.7 started to get dumb.

ionwakeyesterday at 9:36 PM

Im tired boss, I'm already being perfectly gaslit by the current models.

vb-8448yesterday at 6:16 PM

Now i get why in the last days claude code limits were lasting few prompts ...

stainablesteelyesterday at 9:37 PM

i'm beginning to find it comical how every model release always presents itself as superior to every other model on the market, but they always leave just one test where some other model was modestly better, just in case.

maltemalteyesterday at 5:55 PM

"We’re making swift progress on developing these safeguards and expect to be able to bring Mythos-class models to all our customers in the coming weeks."

k_plankenhorntoday at 3:26 PM

[dead]

thibranyesterday at 6:30 PM

Nice, now make it 20x cheaper.

show 1 reply
Marciplanyesterday at 5:08 PM

Lol you still use GPT 5.5 bro we’re all back on Opus 4.8!

deadbabeyesterday at 5:01 PM

Looking forward to people saying how it’s actually shittier and they’re going back to [some earlier cheaper model]

show 1 reply
diimdeeptoday at 4:03 AM

It is bananas that with supposed $965B valuation this Org to this day https://huggingface.co/Anthropic

  models 0
  None public yet 
how is this even possible and ok with them?
damstayesterday at 9:20 PM

Meh

firemeltyesterday at 5:39 PM

what a fucking frontier!

McDownloadsyesterday at 4:52 PM

Disappointed to say the least.

ecommerceguyyesterday at 9:25 PM

yawn

dakolliyesterday at 6:58 PM

Reminder the only benchmark that really matters is the one that measures the ability for the model to do real world tasks that someone would pay for on Upwork that would take ~12 hrs for a human to do.

The best model has a < 5% pass rate. These are incredibly simple jobs that you wouldn't pay much for. These things fail miserably. Stop falling for this dumb marketing, these things are legitimately useless in the real world unless you love mediocrity and have no standards.

https://labs.scale.com/leaderboard/rli

Stop frying your brain with these useless tools, reducing your output to the mean. You people are betting your competency on the quality and quantity of tokens you'll have access to.. which guess what, so that will be the same as everyone else.

There are handmade watchmakers in Switzerland, and mass manufacturers of watches in Asia. Who is more valuable as individual, the guy who knows how to push the buttons on a conveyor belt in Vietnam or the guy who makes one watch a month in Switzerland?

Your vibe coded slop isn't impressive either, sorry. None of it.

show 1 reply
mikdantoday at 1:10 PM

[flagged]

sspoisktoday at 10:27 AM

[flagged]

blueblazintoday at 12:55 PM

[dead]

🔗 View 40 more comments