I cancelled Claude: Token issues, declining quality, and poor support

888 points • by y42 • yesterday at 3:59 PM • 526 comments • view on HN

Comments

I write detailed specs. Multifile with example code. In markdown.

Then hand over to Claude Sonnet.

With hard requirements listed, I found out that the generated code missed requirements, had duplicate code or even unnecessary code wrangling data (mapping objects into new objects of narrower types when won't be needed) along with tests that fake and work around to pass.

So turns out that I'm not writing code but I'm reading lots of code.

The fact that I know first hand prior to Gen AI is that writing code is way easier. It is reading the code, understanding it and making a mental model that's way more labour intensive.

Therefore I need more time and effort with Gen AI than I needed before because I need to read a lot of code, understand it and ensure it adheres to what mental model I have.

Hence Gen AI at this price point which Anthropic offers is a net negative for me because I am not vibe coding, I'm building real software that real humans depend upon and my users deserve better attention and focus from me hence I'll be cancelling my subscription shortly.

➕ show 24 replies

rectang • yesterday at 4:50 PM

I feel like I'm using Claude Opus pretty effectively and I'm honestly not running up against limits in my mid-tier subscriptions. My workflow is more "copilot" than "autopilot", in that I craft prompts for contained tasks and review nearly everything, so it's pretty light compared to people doing vibe coding.

The market-leading technology is pretty close to "good enough" for how I'm using it. I look forward to the day when LLM-assisted coding is commoditized. I could really go for an open source model based on properly licensed code.

➕ show 9 replies

janwillemb • yesterday at 4:40 PM

This is what worries me. People become dependent on these GenAI products that are proprietary, not transparant, and need a subscription. People build on it like it is a solid foundation. But all of a sudden the owner just pulls the foundation from under your building.

➕ show 13 replies

wood_spirit • yesterday at 5:40 PM

Me and so many coworkers have been struggling with a big cognitive decline in Claude over the last two months. 4.5 was useful and 4.6 was great. I had my own little benchmark and 4.5 could just about keep track of a two way pointer merge loop whereas 4.6 managed a 3 way and the 1M context managed k-way. And this ability to track braids directly helped it understand real production code and make changes and be useful etc.

but then two months ago 4.6 started getting forgetful and making very dumb decisions and so on. Everyone started comparing notes and realising it wasn’t “just them”. And 4.7 isn’t much better and the last few weeks we keep having to battle the auto level of effort downgrade and so on. So much friction as you think “that was dumb” and have to go check the settings again and see there has been some silent downgrade.

We all miss the early days of 4.6, which just show you can have a good useful model. LLMs can be really powerful but in delivering it to the mass market Anthropic throttle and downgrade it to not useful.

My thinking is that soon deepseek reaches the more-than-good-enough 4.6+ level and everyone can get off the Claude pay-more-for-less trajectory. We don’t need much more than we’ve already had a glimpse of and now know is possible. We just need it in our control and provisioned not metered so we can depend upon it.

➕ show 2 replies

wilbur_whateley • yesterday at 4:29 PM

Claude with Sonnet medium effort just used 100% of my session limit, some extra dollars, thought for 53 minutes, and said:

API Error: Claude's response exceeded the 32000 output token maximum. To configure this behavior, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable.

➕ show 6 replies

anonyfox • yesterday at 5:02 PM

My max20 sub is sitting unused since april mostly now, codex with 5.4 (and now 5.5) even with fast mode (= double token costs) is night and day. Opus is doing convincing failures and either forgets half the important details or decides to do "pragmatic" (read: technical debt bandaids or worse) silently and claims success even with everything crashing and burning after the changes. and point out the errors it will make even more messes. Opus works really well for oneshotting greenfield scopes, but iterating on it later or doing complex integrations its just unusable and even harmfully bad.

GPT 5.4+ takes its time and considers even edgecases unprovoked that in fact are correct and saves me subsequent error hunting turns and finally delivers. Plus no "this doesn't look like malware" or "actually wait" thinking loops for minutes over a oneliner script change.

➕ show 3 replies

zkmon • yesterday at 4:41 PM

Yesterday was a realization point for me. I gave a simple extraction task to Claude code with a local LLM and it "whirred" and "purred" for 10 minutes. Then I submitted the same data and prompt directly to model via llama_cpp chat UI and the model single-shotted it in under a minute. So obviously something wrong with coding agent or the way it is talking to LLM.

Now I'm looking for an extremely simple open-source coding agent. Nanocoder doesn't seem install on my Mac and it brings node-modules bloat, so no. Opencode seems not quite open-source. For now, I'm doing the work of coding agent and using llama_cpp web UI. Chugging it along fine.

➕ show 8 replies

drunken_thor • yesterday at 4:40 PM

AI services are only minorly incentivized to reduce token usage. They want high token usage, it makes you pay more. They are going to continually test where the limit is, what is the max token usage before you get angry. All AI companies will continue to trade places for token use and cost as cost increases. We are in tepid water pretending it is a bath pretending we aren’t about to be boiled frogs.

➕ show 8 replies

bryan0 • yesterday at 11:20 PM

I see a lot of people struggling to work with agents. This post has a good example:

> “you can’t be serious — is this how you fix things? just WORKAROUNDS????”

If this is how you’re interacting with your agents I think you’re in for a world of disappointment. An important part of working with agents is providing specific feedback. And beyond that making sure this feedback actually available to them in their context when relevant.

I will ask them why they made a decision and review alternatives with them. These learnings will aid both you and the agent in the future.

➕ show 1 reply

areoform • yesterday at 5:31 PM

I've noticed that sometimes the same Claude model will make logical errors sometimes but not other times. Claude's performance is highly temporal. There's even a graph! https://marginlab.ai/trackers/claude-code/

I haven't seen anyone mention this publicly, but I've noticed that the same model will give wildly different results depending on the quantization. 4-bit is not the same as 8-bit and so on in compute requirements and output quality. https://newsletter.maartengrootendorst.com/p/a-visual-guide-...

I'm aware that frontier models don't work in the same way, but I've often wondered if there's a fidelity dial somewhere that's being used to change the amount of memory / resources each model takes during peak hours v. off hours. Does anyone know if that's the case?

➕ show 1 reply

petterroea • yesterday at 4:52 PM

Looking at Anthropic's new products I think they understand they don't really have a cutting edge other than the brand.

I tried Kimi 2.6 and it's almost comparable to Opus. Anthropic lost the ball. I hope this is a sign the we are moving towards a future where model usage is a commodity with heavy competition on price/performance

➕ show 3 replies

ChicagoDave • yesterday at 7:13 PM

I think there’s a clear split amongst GenAI developers.

One group is consistently trying to play whack-a-mole with different models/tools and prompt engineering and has shown a sine-wave of success.

The other group, seemingly made up of architects and Domain-Driven Design adherents has had a straight-line of high productivity and generating clean code, regardless of model and tooling.

I have consistently advised all GenAI developers to align with that second group, but it’s clear many developers insist on the whack-a-mole mentality.

I have even wrapped my advice in https://devarch.ai/ which has codified how I extract a high level of quality code and an ability to manage a complex application.

Anthropic has done some goofy things recently, but they cleaned it up because we all reported issues immediately. I think it’s in their best interests to keep developers happy.

My two cents.

➕ show 4 replies

lukaslalinsky • yesterday at 7:33 PM

I feel like Opus 4.5 was the peak in Claude Code usefulness. It was smart, it was interactive, it was precise. In 4.6 and 4.7, it spends a long time thinking and I don't know what's happening, often hits a dead-end and just continues. For a while I was setting Opus 4.5 in Claude Code, but it got reset often. I just canceled my Max plan, don't know where to look for alternatives.

cbg0 • yesterday at 4:37 PM

I've been a fan since the launch of the first Sonnet model and big props for standing up to the government, but you can sure lose that good faith fast when you piss off your paying customers with bad communication, shaky model quality and lowered usage limits.

taffydavid • today at 7:27 AM

I know this thread is likely full of similar anecdotes, but I also want to share.

My experience very suddenly and very clearly degraded over the last few days.

Today I was trying to build a simple chess game. Previous one shots were HTML, this gave me a jsx file. I asked it to put it HTML and it absolutely devoured my credits doing so, I had to abort and do it manually. The resulting app didn't work, and it had decided that multiplayer could work by storing the game state only on local storage without the clients communicating at all

stldev • yesterday at 4:45 PM

Same, after being a long-time proponent too.

First was the CC adaptive thinking change, then 4.7. Even with `/effort max` and keeping under 20% of 1M context, the quality degradation is obvious.

I don't understand their strategy here.

➕ show 1 reply

siliconc0w • yesterday at 4:46 PM

Shameless self plug but also worried about the silent quality regressions, I started building a tool to track coding agent performance over time.. https://github.com/s1liconcow/repogauge

Here is a sample report that tries out the cheaper models + the newest Kimi2.6 model against the 5.4 'gold' testcases from the repo: https://repogauge.org/sample_report.

➕ show 1 reply

binaryturtle • yesterday at 5:07 PM

I have a simple rule: I won't pay for that stuff. First they steal all my work to feed into those models, afterwards I shall pay for it? No way!

I use AI, but only what is free-of-charge, and if that doesn't cut it, I just do it like in the good old times, by using my own brain.

➕ show 1 reply

mrinterweb • yesterday at 5:12 PM

My recent frustration with Claude has been it feels like I'm waiting on responses more. I don't have historical latency to compare this with, but I feel like it has been getting slower. I may be wrong, and maybe its just spending more time thinking than it used to. My guess is Anthropic is having capacity issues. I hope I'm wrong because I don't want to switch.

➕ show 1 reply

pram • yesterday at 4:54 PM

I’ve noticed most of the complaints are about the Pro plan. Anecdotally I pay for the $200 Max plan and haven’t noticed anything radically different re: tokens or thinking time (availability is still a crapshoot)

I am certainly not saying people should “spend more money,” more like the Claude Code access in the Pro plan seems kind of like false advertising. Since it’s technically usable, but not really.

➕ show 2 replies

bauerd • yesterday at 4:43 PM

They can't afford to care about individual customers because enterprise demand exploded and they're short on compute

stan_kirdey • yesterday at 5:41 PM

I also cancelled my subscription.The $20 Pro plan has become completely unusable for any real work. What is especially frustrating is that Claude Chat and Claude Code now share the exact same usage limits — it makes zero sense from a product standpoint when the workflows are so different. Even the $200 Max plan got heavily nerfed. What used to easily last me a full week (or more) of solid daily use now burns out in just a few days. Combined with the quality drop and unpredictable token consumption, it simply stopped being worth it.

algoth1 • yesterday at 4:50 PM

Doesn't "poor support" implies that there is some sort of support? Shouldnt it be "no support"

➕ show 1 reply

vintagedave • yesterday at 5:14 PM

They won't even reset usage for me: https://news.ycombinator.com/item?id=47892445

And by crikey do I empathise with the poor support in this article. Nothing has soured me on Anthropic more than their attitude.

Great AI engineers. Questionable command line engineers (but highly successful.) Downright awful to their customers.

lanthissa • yesterday at 4:46 PM

for all the drama, its pretty clear both openai, google, and anthropic have had to degrade some of their products because of a lack of supply.

There's really no immediate solution to this other than letting the price float or limiting users as capacity is built out this gets better.

PeterStuer • yesterday at 6:33 PM

I'm on max x5. No limit problems, but I am definetly feeling the decline. Early stopping and being hellbent on taking shortcuts being the main culprits, closely followed by over optimistic (stale) caching (audit your hooks!).

All mostly mitigatable by rigorous audits and steering, but man, it should not have to be.

aucisson_masque • yesterday at 10:41 PM

First ever time I used ai to code was a week ago, went with the Claude pro because I didn't want to commit.

The 20$ plan has incredible value but also, the limit are just way too tight.

I'm glad Claude made me discover the strength of ai, but now it's time to poke around and see what is more customer friendly. I found deepseek V4 to be extremely cheap and also just as good.

Plus I get the benefit to use it in vs code instead of using Claude proprietary app.

I think that when people goes over the hype and social pressure, anthropic will lose quite a lot of customer.

torstenvl • yesterday at 6:00 PM

I feel like almost everyone using AI for support systems is utterly failing at the same incredibly obvious place.

The first job of any support system—both in terms of importance and chronologically—is triage. This is not a research issue and it's not an interaction issue. It's at root a classification problem and should be trained and implemented as such.

There are three broad categories of interaction: cranks, grandmas, and wtfs.

Cranks are the people opening a support chat to tell you they have vital missing information about the Kennedy Assassintion or they want your help suing the government for their exposure to Agent Orange when they were stationed at Minot. "Unfortunately I can't help with that. We are a website that sells wholesale frozen lemonade. Good luck!"

Grandma questions are the people who can't navigate your website. (This isn't meant to be derogatory, just vivid; I have grandma questions often enough myself.) They need to be pointed toward some resource: a help page, a kb article, a settings page, whatever. These are good tasks for a human or LLM agent with a script or guideline and excellent knowledge/training on the support knowledge base.

WTFs are everything else. Every weird undocumented behavior, every emergent circumstance, every invalid state, etc. These are your best customers and they should be escalated to a real human, preferably a smart one, as soon as realistically possible. They're your best customers because (a) they are investing time into fixing something that actually went wrong; (b) they will walk you through it in greater detail than a bug report, live, and help you figure it out; and (c) they are invested, which means you have an opportunity for real loyalty and word-of-mouth gains.

What most AI systems (whether LLMs or scripts) do wrong is that they treat WTFs like they're grandmas. They're spending significant money on building these systems just to destroy the value they get from the most intelligent and passionate people in their customer base doing in-depth production QC/QA.

➕ show 1 reply

lawrence1 • yesterday at 6:22 PM

The timeline doesn't make any sense. How can you subscribe a couple weeks ago and the problem start 3 weeks ago and yet things also went well for the first few weeks. was this written by GPT 5.5?

➕ show 2 replies

arikrahman • today at 12:05 AM

I use Aider nowadays, and will probably cancel my Github multi AI bundle subscription due to the new training policy. I find using Aider with the new open models and using Open Spec to negotiate requirements before the handoff, has helped me a lot.

zulban • yesterday at 10:32 PM

Curious. Not my experience whatsoever.

I tried Claude recently and it was able to one-shot fixes on 9/9 of the bugs I gave it on my large and older Unity C# project. Only 2/9 needed minor tweaks for personal style (functionally the same).

Maybe it helps that I separately have a CLI with very extensive unit tests. Or that I just signed up. Or that I use Claude late in the evenings (off hours). I also give it very targeted instructions and if it's taking longer than a couple minutes - I abort and try a different or more precise prompt. Maybe the backend recognizes that I use it sparingly and I get better service.

The author describes what sounds like very large tasks that I'd never hand off to an AI to run wild in 2026.

Anyway I thought I'd give a different perspective than this thread.

joozio • yesterday at 6:54 PM

Funny. I thought I was the only one. Then I found more people and now you wrote about that. Just this week I also wrote about Claude Opus 4.7 and how I came back to Codex after that: https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026

➕ show 1 reply

vondur • yesterday at 4:55 PM

Wait, weren't there posts in the not too distant past where everyone was signing the praises for Claude and wondering how OpenAI will catch up?

➕ show 2 replies

vivin • yesterday at 8:51 PM

This is interesting to me, because Claude has been a net-positive for me. I haven't faced token issues or declining quality. I generally work with Claude as an assistant -- I may have it do planning and have it "one shot" a thing, but it's usually a personal tool or a utility that I want it to write.

For actual code that goes out to production, I generally figure out how I would solve the problem myself (but will use Claude to bounce ideas and approaches -- or as a search engine) and then have Claude do the boring bits.

Recently I had to migrate a rules-engine into an FSM-based engine. I already had my plan and approach. I had Claude do the boring bits while I implemented the engine myself. I find that Claude does best when you give it small, focused, incremental tasks.

isjcjwjdkwjxk • yesterday at 4:51 PM

Oh no, the unreliable product people pretend is the next coming of Jesus turned out to be thoroughly unreliable. Who coulda thunk it.

datavirtue • today at 11:43 AM

I have enterprise plans for all AI services except Google. GitHub Copilot in VS Code is the best I have used so far. I hear a lot of complaints from people who are holding it wrong. In a single day I can have a beautiful greenfield app deployed. One dev. One day. Something that would have taken weeks with two teams bumping into each other. It's fully documented. Beautiful code. I read the reasoning prompts as it flows by to get an idea of what is going on. I work in phases and review the code and working product quickly after that. Minimal issues.

I'm an executive, the devs complaining are getting retrained or put on the chopping block.

My rockstars are now random contractor devs from Vietnam. The aloof FTE grey beards saying "I don't know, it doesn't work very good on X." Are getting a talking to or being sidelined/canned. So far most of my grey beards are adapting pretty well.

I'm not waiting on people to write code any more. No way in hell.

easythrees • yesterday at 4:29 PM

I have to say, this has been the opposite of my experience. If anything, I have moved over more work from ChatGPT to Claude.

➕ show 1 reply

burnJS • yesterday at 7:10 PM

My experience is Claude and others are good at writing methods and smaller because you can dictate what it should do in less tokens and easily read the code. This closes the feadback loop for me.

I occasionally ask AI to write lots of code such as a whole feature (>= medium shirt size) or sometimes even bigger components of said feature and I often just revert what it generated. It's not good for all the reasons mentioned.

Other times I accept its output as a rough draft and then tell it how to refactor its code from mid to senior level.

I'm sure it will get better but this is my trust level with it. It saves me time within these confines.

Edit: it is a valuable code reviewer for me, especially as a solo stealth startup.

brunooliv • yesterday at 11:04 PM

I still haven’t seen any other models be as complete as Claude inside Claude Code. I bet Anthropic knows this and they turn the knobs and see people’s reactions… I have been planning with Qwen3.6 Max inside opencode, absolutely game changer. Opus can then follow the plan quite detailed and like this I can make progress on my toy apps on Pro plan at 20/mo.

For work, unlimited usage via Bedrock.

Yes I’d like to get more usage out of my personal sub, but at 20/mo no complains

airbreather • yesterday at 9:06 PM

I am sort of in the same place, it seems to have lost enough of the magic that I might be better trying to do more with running local LLMs on my 4090.

The thing is running local LLMs will give some kind of reliability and fixed expectations that saves a lot of time - yeah sure Claude might be fantastic one day, but what do I do when the same workload churns out shit the next and I am halfway thru updating and referencing a 500 document set?

Better the devil you know and all that.

duxup • yesterday at 7:46 PM

I’ve definitely encountered a drop in Claude quality.

Even a simple prompt focused on two files I told Claude to do a thing to file A and not change file B (we were using it as a reference).

Claude’s plan was to not touch file B.

First thing it did was alter file B. Astonishing simple task and total failure.

It was all of one prompt, simple task, it failed outright.

I also had it declare that some function did not have a default value and then explain what the fun does and how it defaults to a specific value….

Fundamentally absurd failures that have seriously impacted my level of trust with Claude.

throwaway2027 • yesterday at 4:31 PM

Same. I think one of the issues is that Claude reached a treshold where I could just rely on it being good and having to manually fix it up less and less and other models hadn't reached that point yet so I was aware of that and knew I had to fix things up or do a second pass or more. Other providers also move you to a worse model after you run out which is key in setting expectation as well. Developers knew that that was the trade-off.

I think even with the worse limits people still hated it but when you start to either on purpose or inadvertently make the model dumber that's when there's really no purpose to keep using Claude anymore.

nikolay • yesterday at 5:24 PM

I can agree. ChatGPT 5.5 made this a no-brainer choice. Anthropic are idiots removing Claude Code from the Pro plan. They need to ask Claude if what they did was a natural intelligence bug! Greed kills companies, too!

➕ show 2 replies

0xchamin • yesterday at 10:23 PM

One of the biggest problem with Claude is, it tries to do things that I don't even ask. I really like to have full control over what I do. I feel sometimes, Claude has the urgency to keep going with what it is hardcore programmed for instead waiting for my feedback. Looks like, Claude consider everything to be oneshot. I maybe wrong, this is my personal experience

➕ show 1 reply

chaosprint • yesterday at 10:16 PM

I bought a Claude membership a few days ago. I asked him to fix a React issue—a very simple UI modification with almost no logic. He still failed to understand it. And after three attempts, the 5-hour limit was reached. This was a disaster. I had to immediately buy a CodeX membership and also tried Image2. I won't give Claude another chance.

➕ show 1 reply

dostick • yesterday at 7:00 PM

The discussion about Claude always omit the important context - which language/platform you’re using it for. It is best trained for web languages and has most up to date knowledge for that. If you use it for Swift it is trained on whole landfill of code and that gives you strong bias towards pre-Swift 6 coding output. Imagine you would give Claude a requirements for a web app, and it implements it all in JQuery. That’s what happens with other platforms.

➕ show 1 reply

rurban • yesterday at 7:58 PM

That's bad for him, because he already had a cheap plan. Now he wont get it back that easy.

Pro is gone. OpenAI plans are more expensive. He can only buy a Kimi plan, which is at least better than Sonnet. But frontier for cheap is gone. Even copilot business plans are getting very expensive soon, also switching to API usage only.

lawrence1 • yesterday at 6:20 PM

The timeline of the first few sentences doesn't add up. how can you subscribe 2 weeks ago when the problem started 3 weeks ago.

➕ show 1 reply

Animats • today at 3:59 AM

Support? You expected support? Live support?

Most of this is about the billing system, which is apparently broken.

kx_x • yesterday at 7:56 PM

After the fixes in Claude Code, Opus 4.6/4.7 have been performing well.

Before the fixes, they were complete trash and I was ready to cancel this month.

Now, I'm feeling like the AI wars are back -- GPT 5.5 and Opus 4.7 are both really good. I'm no longer feeling like we're using nerfed models (knock on wood)!

alt Hacker News

I cancelled Claude: Token issues, declining quality, and poor support

Comments

🔗 View 50 more comments