1. They changed the default in March from high to medium, however Claude Code still showed high (took 1 month 3 days to notice and remediate)
2. Old sessions had the thinking tokens stripped, resuming the session made Claude stupid (took 15 days to notice and remediate)
3. System prompt to make Claude less verbose reducing coding quality (4 days - better)
All this to say... the experience of suspecting a model is getting worse while Anthropic publicly gaslights their user-base: "we never degrade model performance" is frustrating.
Yes, models are complex and deploying them at scale given their usage uptick is hard. It's clear they are playing with too many independent variables simultaneously.
However you are obligated to communicate honestly to your users to match expectations. Am I being A/B tested? When was the date of the last system prompt change? I don't need to know what changed, just that it did, etc.
Doing this proactively would certainly match expectations for a fast-moving product like this.
To be fair to Anthropic, they did not intentionally degrade performance.
To take the opposite side, this is the quality of software you get atm when your org is all in on vibe coding everything.
None of these problems equate to degrading model performance. Completely different team. Degraded CC harness, sure.
> Anthropic publicly gaslights their user-base: "we never degrade model performance" is frustrating.
They're not gaslighting anyone here: they're very clear that the model itself, as in Opus 4.7, was not degraded in any way (i.e. if you take them at their word, they do not drop to lower quantisations of Claude during peak load).
However, the infrastructure around it - Claude Code, etc - is very much subject to change, and I agree that they should manage these changes better and ensure that they are well-communicated.
> 2. Old sessions had the thinking tokens stripped, resuming the session made Claude stupid (took 15 days to notice and remediate)
This one was egregious: after a one hour user pause, apparently they cleared the cache and then continued to apply “forgetting” for the rest of the session after the resume!
Seems like a very basic software engineering error that would be caught by normal unit testing.