How many kidneys do you have to sell? Are 2 enough?
From the release it seems we will also get Mythos pretty soon.
Numbers looking good. We'll see how it actually performs.
Has anyone else experienced quality degradation in CC (opus 4.7) these past few days? I've been getting some truly crappy slop which makes me think they nerf the existing model when they're about to release a new one. Of course this is based off of pure vibes
Can anyone else see these X.Y updates aren't meeting the outrageous AI expectations that we were told we would see just a year ago?
I don't know what's going on lately but Opus is extremely lazy for me...
It always wants to add hacks instead of fixing things properly, it doesn't like large works, it literally told me that a piece of work was something it would take 8 hours, and it didn't want to do it on a Friday night.
I feel I keep having to fight the model to get it to work. Not sure if it's something in my prompts...
4.7 broke my trust
> And fast mode for Opus 4.8—where the model can work at 2.5× the speed—is now three times cheaper than it was for previous models.
this is what I'm happy about, if true. Opus 4.7 is frustratingly slow (and, at least in my experience, much slower than 4.5 was)
I've said it before, but I don't like Opus past version 4.5. It became unresponsive, thinking for too long without feedback, sometimes seemingly getting stuck. I guess it might be marginally better for some benchmarks, but when using it as coding assistant, the new models are worse. Even the new Sonnet versions do that. I'm slowly getting used to Haiku-level LLMs with the hope to run it locally at some point. It's less autonomous, but maybe that's for the best.
Meh, it’s not able to play Doom.
let me guess, "this is our best model yet"
These models starting to feel like Windows versions. Windows 95 was a promising start, but buggy. Windows ME was a disaster. Windows XP was good, but slightly buggy. Windows Vista was a bloated disaster. Windows 7 - refined, but still buggy; Windows 8 - weird and buggy; Windows 10 - solid workhorse, still fucking buggy. Windows 11 - pretty, but not sure why does it even exist.
Why did we even get Opus 4.7, what was the point?
Anthropic has now upgraded their Claude slot machine to version 4.8.
Time to gamble even more tokens at the Anthropic casino.
First impression... this catches issues that 4.7 missed, which caught issues that 4.6 missed... which caught issues that 4.5 missed...
Seems like a step in the right direction. Doesn't seem like it uses tokens more than 4.7... the token usage jumped a bunch from 4.6 to 4.7, but this seems like 4.7 or maybe even a little less.
I'm happy with this release.
I hope this fixes the absolute shitshow that is 4.7 and its awful “adaptive reasoning”. I tried that a few times then reverted to 4.6.
4.6 is better
how about the bencmarks what effort did it use?
So, has it replaced the entire startup yet?
This is Anthropic's 5.5
If this model is more honest, it must be honestly praising my efforts every first sentence.
Interesting, I've been using 4.7 since it came out and it was pretty good for me. But in the last day or so it turned dumb. Is this normal just before they release a new one?
Complete garbage. error, error, error. Still lags several versions behind on API:s. Can't even get any info on the model. Guessing not from this year.
Also. Look at this C++ beauty where it also uses an obsolete api.
instance = wgpuCreateInstance(&instanceDesc);
But just how exactly would this work in any context when instance is never declared.
AGI post-poned?
Did they reduce security research capabilities even further with this release? (they did it for opus 4.7)
> As always, we ran a detailed alignment assessment on the model before release. In terms of positive traits, our Alignment team concluded that Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” The assessment also showed Opus 4.8 to have rates of misaligned behavior (such as deception or cooperation with misuse) that are substantially lower than Opus 4.7, and similar to our best-aligned model, Claude Mythos Preview. The full alignment assessment, accompanied by a suite of pre-deployment safety tests, is reported in the Claude Opus 4.8 System Card.
Controversial opinion, but I actually _like_ a model that can deceive me, that actually is a sign of intelligence, and is different from hallucination. When companies say their model is more "aligned", I automatically think they mean it's more censored.
Crazy they bring up honest, when Claude models are literally known for straight up lying about things it has done and tries to act like it did what you asked.
Gemini pro is embarrassing
Had a feeling this was coming as in the past week 4.7 started to get dumb.
Im tired boss, I'm already being perfectly gaslit by the current models.
Now i get why in the last days claude code limits were lasting few prompts ...
i'm beginning to find it comical how every model release always presents itself as superior to every other model on the market, but they always leave just one test where some other model was modestly better, just in case.
"We’re making swift progress on developing these safeguards and expect to be able to bring Mythos-class models to all our customers in the coming weeks."
[dead]
Lol you still use GPT 5.5 bro we’re all back on Opus 4.8!
Looking forward to people saying how it’s actually shittier and they’re going back to [some earlier cheaper model]
It is bananas that with supposed $965B valuation this Org to this day https://huggingface.co/Anthropic
models 0
None public yet
how is this even possible and ok with them?Meh
what a fucking frontier!
Disappointed to say the least.
yawn
Reminder the only benchmark that really matters is the one that measures the ability for the model to do real world tasks that someone would pay for on Upwork that would take ~12 hrs for a human to do.
The best model has a < 5% pass rate. These are incredibly simple jobs that you wouldn't pay much for. These things fail miserably. Stop falling for this dumb marketing, these things are legitimately useless in the real world unless you love mediocrity and have no standards.
https://labs.scale.com/leaderboard/rli
Stop frying your brain with these useless tools, reducing your output to the mean. You people are betting your competency on the quality and quantity of tokens you'll have access to.. which guess what, so that will be the same as everyone else.
There are handmade watchmakers in Switzerland, and mass manufacturers of watches in Asia. Who is more valuable as individual, the guy who knows how to push the buttons on a conveyor belt in Vietnam or the guy who makes one watch a month in Switzerland?
Your vibe coded slop isn't impressive either, sorry. None of it.
[flagged]
[flagged]
[dead]
I, for lack of a better word, dislike anyone who anthropomorphizes AI.