An update on recent Claude Code quality reports

804 points • by mfiguiere • yesterday at 5:48 PM • 617 comments • view on HN

Comments

Appreciate the honesty from the team.

At the same time, personally I find prioritizing quality over quantity of output to be a better personal strategy. Ten partially buggy features really aren't as good as three quality ones.

sreekanth850 • today at 5:27 AM

Who’s going to pay for the exorbitant number of tokens Claude used without delivering any meaningful outcome? I spent many sessions getting zero results, and when I posted about it on their subreddit, all I got were personal attacks from bots and fanboys. I instantly cancelled my subscription and moved to Codex.

Also, it may be a coincidence, that the article was published just before the GPT 5.5 launch, and then they restored the original model while releasing a PR statement claiming it was due to bugs.

voxelc4L • today at 12:58 AM

I’ve stuck to the non-1M context Opus 4.6 and it works really well for me, even with on-going context compression. I honestly couldn’t deal with the 1M context change and then the compounding token devouring nonsense of 4.7 I sincerely hope Anthropic is seeing all of this and taking note. They have their work cut out for them.

➕ show 1 reply

jwpapi • yesterday at 8:40 PM

Those are exactly the kind of issues you run into when your app is ai coded you built one thing and kill something else.

You have too many and the wrong benchmarks

rebolek • yesterday at 9:42 PM

> On April 16, we added a system prompt instruction to reduce verbosity.

What verbosity? Most of the time I don’t know what it’s doing.

➕ show 1 reply

deaux • yesterday at 10:33 PM

They had this ready and timed it for GPT 5.5 announcement. Zero chance it's a coincidence .

RamblingCTO • today at 8:49 AM

Doesn't change anything about opus 4.7 being an absolute buffon. Even going back to opus 4.6 doesn't feel like the magical period maybe 3-4 weeks ago. Gonna go back to openAI

ankit219 • yesterday at 9:21 PM

An interesting question to wonder is why these optimizations were pushed so aggressively in the first place. Especially given this is the time they were running a 2x promotion, by themselves, without presumably seeing any slowdown in demand.

Alifatisk • yesterday at 6:03 PM

It’s incredible how forgiving you guys are with Anthropic and their errors. Especially considering you pay high price for their service and receive lower quality than expected.

➕ show 17 replies

gilrain • yesterday at 7:38 PM

Hi Boris, random observer here. Would you consider apologizing to the community for mistakenly closing tickets related to this and then wrongly keeping them closed when, internally, you realized they were legitimate?

I think an apology for that incident would go a long way.

nopurpose • today at 3:07 AM

Weren't there reports that quality decreased when using non-CC harnesses too? Nothing in blog post can explain that.

natdempk • yesterday at 6:17 PM

As an end-user, I feel like they're kind of over-cooking and under-describing the features and behavior of what is a tool at the end of the day. Today the models are in a place where the context management, reasoning effort, etc. all needs to be very stable to work well.

The thing about session resumption changing the context of a session by truncating thinking is a surprise to me, I don't think that's even documented behavior anywhere?

It's interesting to look at how many bugs are filed on the various coding agent repos. Hard to say how many are real / unique, but quantities feel very high and not hard to run into real bugs rapidly as a user as you use various features and slash commands.

zagwdt • today at 7:56 AM

ngl lost alot of trust in cc after reading this, specially point 1

how do you just do that to millions of users building prod code with your shit

zem • yesterday at 10:48 PM

ugh, caching based on idle time is horrible for my usage anyway; since claude is both fairly slow and doesn't really have much of a daily quota anyway I often tell it to do something and then wander off and come back to check on it when I next think about it. I always vaguely assumed that my session would not "detect" the intervening time anyway since it was all async. I guess from a global perspective time-based cache eviction makes sense.

kristianc • yesterday at 8:14 PM

To think we'd have known about this in advance if they'd just have open sourced Claude Code, rather than them being forced into this embarrassing post mortem. Sunlight is the best disinfectant.

KronisLV • yesterday at 6:49 PM

This reads like good news! They probably still lost a bunch of users due to the negative public sentiment and not responding quickly enough, but at least they addressed it with a good bit of transparency.

xlayn • yesterday at 6:10 PM

If anthropic is doing this as a result of "optimizations" they need to stop doing that and raise the price. The other thing, there should be a way to test a model and validate that the model is answering exactly the same each time. I have experienced twice... when a new model is going to come out... the quality of the top dog one starts going down... and bam.. the new model is so good.... like the previous one 3 months ago.

The other thing, when anthropic turns on lazy claude... (I want to coin here the term Claudez for the version of claude that's lazy.. Claude zzZZzz = Claudez) that thing is terrible... you ask the model for something... and it's like... oh yes, that will probably depend on memory bandwith... do you want me to search that?...

YES... DO IT... FRICKING MACHINE..

➕ show 5 replies

wg0 • today at 2:28 AM

A heavily vibe coded CLI would have tons of issues, regularly.

LLMs over edit and it's a known problem.

einrealist • yesterday at 6:25 PM

Is 'refactoring Markdown files' already a thing?

➕ show 1 reply

2001zhaozhao • yesterday at 6:18 PM

How about just not change the harness abruptly in the first place? Make new system prompt changes "experimental" first so you can gather feedback.

➕ show 1 reply

tdg5 • yesterday at 11:45 PM

I missed the part about the refunds…

davidfstr • yesterday at 6:46 PM

Good on Anthropic for giving an update & token refund, given the recent rumors of an inexplicable drop in quality. I applaud the transparency.

➕ show 1 reply

throwaway2027 • yesterday at 7:52 PM

Cool but I switched to Codex for the time being.

hirako2000 • yesterday at 11:37 PM

In other words we did the right things, but we understand feedback, oh and bugs happen.

8note • yesterday at 8:57 PM

something i note from this is that this is not a model weights change, but it is a hidden state change anthropic is doing to the outputs that can tune the quality and down on the "model" without breaking the "we arent changing the model" promise.

how often do these changes happen?

motbus3 • yesterday at 6:20 PM

I had similar experience just before 4.5 and before 4.6 were released.

Somehow, three times makes me not feel confident on this response.

Also, if this is all true and correct, how the heck they validate quality before shipping anything?

Shipping Software without quality is pretty easy job even without AI. Just saying....

bearjaws • yesterday at 5:59 PM

The issue making Claude just not do any work was infuriating to say the least. I already ran at medium thinking level so was never impacted, but having to constantly go "okay now do X like you said" was annoying.

Again goes back to the "intern" analogy people like to make.

varispeed • today at 10:21 AM

It appears that Opus 4.7 has been nerfed already. Can't get any sensible results since yesterday. It just keeps running in circles. Even mention that it is committing fraud by doing superficial work it has been told specifically not to do doesn't help.

➕ show 1 reply

gnegggh • today at 8:03 AM

not the first time. Still not showing thinking are we?

ayhanfuat • yesterday at 6:07 PM

Reading the "Going forward" section I see that they have zero understanding of the main complaints.

➕ show 1 reply

noname120 • today at 11:24 AM

So now the solution is to input a “ping” message every hour so that it keeps the cache warm?

walthamstow • yesterday at 7:18 PM

So we weren't going mad then!

ritonlajoie • yesterday at 11:18 PM

yesterday CC created a fastapi /healthz endpoint and told me it's the gold standard (with the ending z). today I stopped my max sub and will be trying codex

➕ show 2 replies

ElFitz • yesterday at 7:04 PM

Now we know why Anthropic banned the use of subscriptions with other agent harnesses: they partially rely on the Claude Code cli to control token usage through various settings.

And it also tells us why we shouldn’t use their harness anyway: they constantly fiddle with it in ways that can seriously impact outcomes without even a warning.

vicchenai • yesterday at 9:54 PM

had this happen to me mid-refactor and spent 20 min wondering if I'd gone crazy. honestly the one hour threshold feels pretty arbitrary, sometimes you just step away to think

whalesalad • yesterday at 7:58 PM

The funny thing is, in the last 3 days Claude has gotten substantially worse. So this claim, "All three issues have now been resolved as of April 20 (v2.1.116)" does not land with me at all.

setnone • yesterday at 6:14 PM

Good on them for resolving all three issues, but is it any good again?

➕ show 1 reply

psubocz • yesterday at 8:28 PM

> All three issues have now been resolved as of April 20 (v2.1.116).

The latest in homebrew is 2.1.108 so not fixed, and I don't see opus 4.7 on the models list... Is homebrew a second class citizen, or am I in the B group?

antirez • yesterday at 7:45 PM

Zero QA basically.

➕ show 1 reply

system2 • today at 6:49 AM

Whatever they did, with the max plan, my daily usage quota was consumed in less than 10 minutes. Weird, let's hope they fix the usage now.

hajile • yesterday at 7:21 PM

My takeaway is that they knew they were changing a bunch of stuff while their reps were gaslighting us in the comments here.

Why should we ever trust what they say again out trust that they won’t be rug-pulling again once this blows over?

EugeneOZ • yesterday at 8:59 PM

If you think that you can just silently modify the model without any announcements and only react when it doesn't go through unnoticed, then be 100% sure that your clients will check every possible alternative and will leave you as soon as they find anything similar in quality (and no, not a degraded one).

ramesh31 • yesterday at 7:45 PM

Effort should not be configurable for Opus, it should be set to a single default that provides the highest level of capability. There are zero instances in which I am willing to accept a lesser result in exchange for a slightly faster response from Opus. If that were the case I would be using Flash or Haiku.

jruz • yesterday at 7:33 PM

Too late bro, switched to Codex I’m done with your bullshit.

systemvoltage • yesterday at 6:42 PM

Interesting. All 3 seems like they’re obviously going to impact quality. e.g, reducing the effort from high to medium.

So then, there must have been an explicit internal guidance/policy that allowed this tradeoff to happen.

Did they fix just the bug or the deeper policy issue?

tontinton • yesterday at 7:32 PM

or you can use a non vibe designed efficient Rust TUI coding agent made by yours truly, all my coworkers use it too :) called https://maki.sh!

lua plugins WIP

maxrev17 • yesterday at 8:57 PM

Please for the love of god just put the max price plan up like 4x or 5x in cost and make it actually work.

rishabhaiover • yesterday at 6:39 PM

Boris gaslighted us with all the quality related incidents for weeks not acknowledging these problems.

➕ show 1 reply

alt Hacker News

An update on recent Claude Code quality reports

Comments

🔗 View 29 more comments