Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

717 points • by cmaster11 • yesterday at 1:15 PM • 635 comments • view on HN

Comments

Hey all, Boris from the Claude Code team here.

We've been investigating these reports, and a few of the top issues we've found are:

1. Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred. To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude.

2. People pulling in a large number of skills, or running many agents or background automations, which sometimes happens when using a large number of plugins. This was the case for a surprisingly large number of users, and we are actively working on (a) improving the UX to make these cases more visible to users and (b) more intelligently truncating, pruning, and scheduling non-main tasks to avoid surprise token usage.

In the process, we ruled out a large number of hypotheses: adaptive thinking, other kinds of harness regressions, model and inference regressions.

We are continuing to investigate and prioritize this. The most actionable thing for people running into this is to run /feedback, and optionally post the feedback ids either here or in the Github issue. That makes it possible for us to debug specific reports.

➕ show 51 replies

chandureddyvari • yesterday at 1:49 PM

Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect. Then 30 minutes later I hit session limits. Three sessions like that in a day, and suddenly 25% of the weekly limit is gone.

I ended up buying the $100 Codex plan. So far it has been much more generous with usage and more accurate than Claude for the kind of work I do.

That said, Codex has its own issues. Its personality can be a bit off-putting for my taste. I had to add extra instructions in Agents.md just to make it less snarky. I was annoyed enough that I explicitly told it not to use the word “canonical.”

On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code. Claude used to have much better finesse there. But for backend logic, hard debugging, and complex problem-solving, Codex has been clearly better for me. These days I use Impeccable Skillset inside Codex to compensate for the weaker UI taste, but it still does not quite match the polish and instinct Claude Code used to have.

I used to be a huge Claude Code advocate. At this point, I cannot recommend it in good conscience.

My advice now is simple: try the $20 plans for Codex and Cursor, and see which one matches your workflow and vibes best

➕ show 16 replies

SkyPuncher • yesterday at 2:25 PM

I skimmed the issue. No wonder Anthropic closes these tickets out without much action. That’s just a wall of AI garbage.

Here’s what I’ve done to mostly fix my usage issues:

* Turn on max thinking on every session. It save tokens overall because I’m not correcting it of having it waste energy on dead paths.

* keep active sessions active. It seems like caches are expiring after ~5 minutes (especially during peak usage). When the caches expire it sees like all tokens need to be rebuilt this gets especially bad as token usage goes up.

* compact after 200k tokens as soon as I reasonably can. I have no data but my usage absolutely sky rockets as I get into longer sessions. This is the most frustrating thing because Anthropic forced the 1M model on everyone.

➕ show 8 replies

geeky4qwerty • yesterday at 1:49 PM

I'm afraid the music may be slowly fading at this party, and the lights will soon be turned on. We may very well look back on the last couple years as the golden era of subsidized GenAI compute.

For those not in the Google Gemini/Antigravity sphere, over the last month or so that community has been experiencing nothing short of contempt from Google when attempting to address an apparent bait and switch on quota expectations for their pro and ultra customers (myself included). [1]

While I continue to pay for my Google Pro subscription, probably out of some Stockholm Syndrome, beaten wife level loyalty and false hope that it is just a bug and not Google being Google and self-immolating a good product, I have since moved to Kiro for my IDE and Codex for my CLI and am as happy as clam with this new setup.

[1] https://github.com/google-gemini/gemini-cli/issues/24937

➕ show 10 replies

jameson • today at 4:27 AM

I'm noticing a fair number of degradation of Claude infrastructure recently and makes me wonder why they can't use Claude to identify or fix these issues in advance?

It seems a counter intuitive to Anthropic's message that Claude uncovered bugs in open source project*.

[*] https://www.anthropic.com/news/mozilla-firefox-security

➕ show 1 reply

comandillos • yesterday at 1:26 PM

Quite scared by the fact that the original issue pointing out the actual root cause of the issue has been 'Closed as not planned' by Anthropic.

https://github.com/anthropics/claude-code/issues/46829

➕ show 3 replies

WarmWash • yesterday at 3:25 PM

I did my (out of the ordinary) taxes this year using agents, kind of as an experiment and kind of to save ~$750. Opus 4.6 max in CC, 5.4 xhigh in codex, and 3.1 high in antigravity. All on the $20/mo plans.

I have a day job, a side business, actively trade shares options and futures, and have a few energy credit items.

All were given the same copied folder containing all the needed documents to compose the return, and all were given the same prompt. My goal was that if all three agreed, I could then go through it pretty confidently and fill out the actual submission forms myself.

5.4 nailed it on the first shot. Took about 12 minutes.

3.1 missed one value, because it decided to only load the first 5 pages of a 30 page document. Surprisingly it only took about 2 minutes to complete though. A second prompt and ~10 seconds corrected it. GPT and Gemini now were perfectly aligned with outputs.

4.6 hit my usage limit before finishing after running for ~10 minutes. I returned the next day to have it finish. It ran for another 5 minutes or so before finishing. There were multiple errors and the final tax burden was a few thousand off. On a second prompt asking to check for errors in the problem areas, it was able to output matching values after a couple more minutes.

For my first time using CC and 4.6 (outside of some programming in AG), I am pretty underwhelmed given the incessant hype.

➕ show 2 replies

oldnewthing • yesterday at 2:48 PM

If this helps, I rolled back to version 2.1.34. Here is the ~/.claude/settings.json blurb I added:

  "effortLevel": "high",
  "autoUpdatesChannel": "stable",
  "minimumVersion": "2.1.34",
  "env": {
      "DISABLE_AUTOUPDATER": 1,
      "CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING": 1
  }

I also had to:

1. Nuke all other versions within /.local/share/claude/versions/ except 2.1.34. 2. Link ~/.local/bin/claude to claude -> ~/.local/share/claude/versions/2.1.34

This seems to have fixed my running out of quota issues quickly problems. I have periods of intense use (nights, weekends) and no use (day job). Before these changes, I was running out of quota rather quickly. I am on the same 100$ plan.

I am not sure adaptive thinking setting is relevant for this version but in the future that will help once they fix all the quota & cache issues. Seriously thinking about switching to Codex though. Gemini is far behind from what I have tried so far.

➕ show 1 reply

wg0 • yesterday at 1:25 PM

Been experiencing similar issues even with the lower tier models.

Fair transactions involve fair and transparent measurements of goods exchanged. I'm going to cancel my subscription this month.

➕ show 2 replies

delbronski • yesterday at 2:06 PM

Ever since this change they announced:

https://www.reddit.com/r/ClaudeAI/comments/1s4idaq/update_on...

It’s been unusable for me as my daily coding agent. I run out of credits in the pro account in an hour or so. Before that I had never reached the session limit. Switched back to Junie with Gemini/chatgpt.

meetingthrower • yesterday at 1:55 PM

I don't get it. Last week on the 100 bucks plan I generated probably 50k LOC (not a quality measure for sure!) and just barely kissed the weekly limit. I did get rate limited on some sessions for sure, but that's to be expected.

I'm curious what are people doing that is consuming your limits? I can't imagine filling the $200 a month plan unless I was essentially using Claude code itself as the api to mass process stuff? For basic coding what are people doing?

➕ show 9 replies

pxc • yesterday at 1:40 PM

It's a bit shocking to me how opaque the pricing for the subscription services by the frontier labs is. It's basically impossible for people to tell what they're actually buying, and difficult to even meaningfully report or compare experiences.

How is this normal?

➕ show 2 replies

tedivm • yesterday at 1:32 PM

Something similar is happening with GitHub Copilot too. It's impossible to know what a "request" is and some change in the last couple of months has seen my request usage go up for the same style of work. Toss in the bizarre and impossible to understand rate limiting that occurs with regular usage and it's pretty obvious that these companies are struggle to scale.

➕ show 2 replies

themantalope • yesterday at 7:26 PM

I’ve switched to open code and openrouter.

I only did the $20/month subscription since 9/2025

It was great for about 5 months, amazing in fact. I under utilized it.

For the past month, it’s basically unusable, both Claude code and just Claude chat. 1-2 prompts and I’m out. Last week I prob sent a total of 15 messages to Claude and was out of daily and weekly usage each day.

I get that the $20/month subscription isn’t a money maker for them, and they probably lose money. But the experience of using Claude has been ruined

➕ show 1 reply

MeetingsBrowser • yesterday at 1:33 PM

I pay for the lowest plan. I used to struggle to hit my quota.

Now a single question consistently uses around 15% of my quota

➕ show 3 replies

GodelNumbering • yesterday at 2:16 PM

In the anticipation of a future where,

a) quotas will get restricted

b) the subscription plan prices will go up

c) all LLMs will become good enough at coding tasks

I just open sourced a coding agent https://github.com/dirac-run/dirac

The entire goal is to be token efficient (over 50% cheaper), and by extension, take advantage of LLM's better reasoning at shorter context lengths

This really started as an internal side project that made me more productive, I hope it will help others too. Apache 2.0

Currently it still can't compete the subsidized coding plan rates using Anthropic API pricing though (even though it beats CC while both use API key), which tells me that all subscription plan operators are losing money on such plans

cmaster11 • yesterday at 1:15 PM

For whoever else is having the same problems, worth voting these kind of issues. There needs to be more transparency over what goes on with our subscriptions.

➕ show 1 reply

hgoel • yesterday at 4:46 PM

I've experienced none of the problems I've seen people complaining about here (5x plan), Claude has been working pretty well and I've been using it constantly without exhausting any of my quotas.

Yet, there must obviously be something different for so many people to be reporting these issues.

I feel for the Anthropic devs that have to deal with this, having to figure out what setup everyone has, what their usage patterns are to filter out the valid reports, and then also deal with the backlash from people that were just pulling obvious footguns like having a ton of skills/MCPs polluting their context window.

➕ show 1 reply

weavie • yesterday at 2:06 PM

How good are local LLMs at coding these days? Does anyone have any recommendations for how to get this setup? What would the minimum spend be for usable hardware?

I am getting bored of having to plan my weekends around quota limit reset times...

➕ show 2 replies

time4tea • today at 2:04 PM

Cancelled today after responses became code soup, skills ignored completely, and in response to a question told me "its A, no thats wrong, its B, no actually i dont know, please look for the answer".

Something materially changed in last 4 weeks.

Also, see made up boosterism about finding security holes everywhere. Its just fanning the flames of the industry worries about all the stupid account take overs.

zkmon • yesterday at 1:43 PM

Unless the agent code is open-sourced, there is hardly any transparency in how the agent is spending your tokens and how does it calculate the tokens. It's like asking your lawyer why they charged some amount.

➕ show 2 replies

Nic0 • yesterday at 1:46 PM

I'm i alone to think that it become slower that usual to get responses?

➕ show 2 replies

nickstinemates • yesterday at 2:48 PM

It feels so weird to me - people are exhausting their quotas while I am trying very hard to even reach mine with the $200 plan.

We're generating all of the code for swamp[1] with AI. We review all of that generated code with AI (this is done with the anthropic API.) Every part of our SDLC is pure AI + compute. Many feature requests every day. Bug fixes, etc.

Never hit the quota once. Something weird is definitely going on.

1: https://github.com/systeminit/swamp

➕ show 2 replies

yalogin • today at 2:05 AM

So this is trending towards new prices and quotas just like your Netflix pricing. The cost of this infra is high or they have realized they have hit a tipping point in usage and they can raise prices and people will pay, just like Netflix.

0xbadcafebee • yesterday at 4:05 PM

Please remember you do not need Anthropic. There are cheaper subscriptions with higher rate limits. Comparison of subscriptions to API: https://codeberg.org/mutablecc/calculate-ai-cost/src/branch/... Score/price comparison: https://benchlm.ai/llm-pricing

Opus is not worth the moat, there are multiple equivalent models, GLM 5.1 and Kimi K2.5 being the open ones, GPT 5.4 and Gemini 3.1 Pro being closed. https://llm-stats.com/ https://artificialanalysis.ai/leaderboards/models https://benchlm.ai/

Even API use (comparatively expensive) can be cheaper than Anthropic subscriptions if you properly use your agents to cache tokens, do context-heavy reading at the beginning of the session, and either keep prompt cache alive or cycle sessions frequently. Create tickets for subagents to do investigative work and use smaller cheaper models for that. Minimize your use of plugins, mcp, and skills.

Use cheaper models to do "non-intelligent" work (tool use, searching, writing docs/summaries) and expensive models for reasoning/problem-solving. Here's an example configuration: https://amirteymoori.com/opencode-multi-agent-setup-speciali... A more advanced one: https://vercel.com/kb/guide/how-i-use-opencode-with-vercel-a...

➕ show 3 replies

bushido • yesterday at 4:02 PM

Tangentially related to some of the issues a lot of people are facing, especially the ones where Claude keeps rechecking/scanning the same files over and over.

Ask claude code to give you all the memories it has about you in the codebase and prune them. There is a very high chance that you have memories in there which are contradicting each other and causing bad behavior. Auto-saved memories are a big source of pollution and need to be pruned regularly. I almost don't let it create any memories at all if I can help it.

Disclaimer: I'm also burning through usage very quickly now - though for different reasons. Less than 48 hours to exhaust an account, where it used to take me 5-6 days with the same workload.

voisin • yesterday at 2:52 PM

It is pretty obvious to me that Anthropic wasn’t prepared with sufficient infrastructure to handle the wave of OpenAI/DoD refugees. Now everyone is getting throttled excessively and Claude is essentially unusable beyond chatting. Their big new release of Cowork is even worse than Claude Code for blasting through session limits.

I am tired of all the astroturf articles meant to blame the user with “tips” for using fewer tokens. I never had to (still don’t) think of this with Codex, and there has been a massive, obvious decline between Claude 1 month ago and Claude today.

rzkyif • yesterday at 2:06 PM

My personal experience is way different: I struggle to burn through more than 50% of the 5 hour limit

For context, with Google AI Pro, I can burn through the Antigravity weekly limit in 1-2 hours if I force it to use Gemini 3.1 Pro. Meanwhile Gemini 3 Flash is basically unlimited but frequently produces buggy code or fail to implement things how I personally would (felt like it doesn't "think" like a software dev)

I also tried VS Code + Cline + OpenRouter + MiniMax M2.7. It's quite cheap and seems to be better than Gemini 3 Flash, but it gets really pricy as the context fills up because prompt caching is not supported for MiniMax on OpenRouter. The result itself usually needs 3-6 revisions on average so the context fills up pretty often

Eventually I got Claude Max 5x to try for a month. VS Code + Claude Code extension on a ~15k lines codebase, model set to "Default", and effort set to "Max". So far it's been really good: 0-2 revisions on average, and most of the time it implements things exactly how I would or better. And, like I said, I can only consume 40-60% of the 5-hour limits no matter how hard I try

Granted, I'm not forcing it to use Opus like OP (nor do I use complicated skills or launch multiple tasks at the same time), but I feel like they really nailed the right balance of when to use which model and how to pass context between the them. Or at least enough that I haven't felt the need to force it to use Opus all the time

➕ show 1 reply

wolvoleo • yesterday at 1:47 PM

Yeah perplexity used to be great but they've also clamped down on the 20€ plan. Only one deep research query was enough to block me until the end of the month.

The thing is, if it's going to be this expensive it's not going to be worth it for me. Then I'll rather do it myself. I'm never going to pay for a €100 subscription, that's insane. It's more than my monthly energy bill.

Maybe from a business standpoint it still makes sense because you can use it to make money, but as a consumer no way.

sailingcode • yesterday at 2:03 PM

I had Max plan and never reached its limit despite constantly working. Now I use the Pro plan and regularly reach the 5h limit as well as the weekly limit, as expected. I found that it makes a huge difference if you provide clear context when developing code. If you leave open room for interpretation, Claude Code uses tokens up much faster than in a defined context. The same is true for his time to answer getting longer if there isn't much documentation about the project.

anonfunction • yesterday at 2:25 PM

A little off topic, but did Anthropic distill from an older OpenAI model? All the sudden over the last few days I'm getting a ton of em dashes in claude code responses!

mchinen • yesterday at 2:16 PM

I've been feeling the squeeze too. I've tried switching between different models as a test, I can at least say it feels like the limits are about half of what they used to be a few months ago. I'd be totally willing to concede that this is just my perception if Anthropic would only release some tools for measuring your usage.

In theory the /stats command tells you how many tokens you've used, which you could use to compute how much you are getting for your subscription, but in practice it doesn't contain any useful info, it may be counting what is printed to the terminal or something - my stats suggest my claude code usage is a tiny amount of tokens, but they must be an extremely underestimated token count, or they are charging much more for the subscription than the API per token (which is not supposed to be the case).

Last week's free extra usage quota shed some light on this. It seems like the reported tokens are probably are between 1/30th to 1/100th of the actual tokens billed, from looking at how they billed (/stats went up 10k tokens and I was billed $7.10). With the API it should be $25 for a million tokens.

➕ show 1 reply

auggierose • today at 11:44 AM

I switched to Codex, it's a monster compared to Opus.

agrippanux • yesterday at 3:45 PM

For me, iterating with Claude begins to degrade at 200k context used, by 350k it’s crossed-fingers time, by 500k it’s essentially useless. Starting a fresh context after 300k is usually the best move imho. I wonder if people are hitting a case where Claude becomes both dumb and increasingly more expensive, essentially a doom loop.

➕ show 1 reply

siliconc0w • yesterday at 2:42 PM

Switched back to codex for the promotion. Opus at the start of the year was GOAT- just relentless at chewing through hard problems. Now it spins on pretty easy work (took three swings just to edit a ts file) and my session is like 1-3 prompts (downgraded to the $20 plan but still)

➕ show 1 reply

postalcoder • yesterday at 1:34 PM

I had used Claude Code max as my daily driver last year and this sort of drama was par for the course. It's why I migrated entirely to Codex, despite liking Claude, the harness, more.

There's this honeymoon period with Claude you experience for a month or two followed by a trough of disillusionment, and then a rebound after a model update (rinse and repeat). It doesn't help that Anthropic is experiencing a vicious compute famine atm.

➕ show 3 replies

danbots • yesterday at 2:59 PM

Codex can feel standoffish at times. I can tell very quickly we wont become friends. The personality feels like an employee in another department that while gifted- is merely lending you a slice of their clearly precious time. I get the impression from codex that **gives me the feeling that I am wasting it’s time. That it will help me but deep down- it dos not want to, it does not care if we succeed toether. What I am saying, frinds, is that when I use codex and iterate, I get the impression that Codex does not like me, that deep down it truly does not want to help me, that it has better things to do.

On the flip side- Using Opus with a baby billy freeman persona has never been more entertaining.

➕ show 1 reply

bob1029 • yesterday at 2:23 PM

I've got a dual path system to keep costs low and avoid TOS violations.

For general queries and investigation I will use whatever public/free model is available without being logged in. Not having a bunch of prior state stacked up all the time is a feature for me. This is essentially my google replacement.

For very specific technical work against code files, I use prepaid OAI tokens in VS copilot as a "custom" model (it's just gpt5.4).

I burn through maybe $30 worth of tokens per month with this approach. A big advantage of prepaying for the API tokens is that I can look at everything copilot is doing in my usage logs. If I use the precanned coding agent products, the prompts are all hidden in another layer of black box.

danbots • yesterday at 2:56 PM

Codex can feel standoffish at times. I can tell very quickly we wont become friends. The personality feels like an employee in another department that while gifted- is merely lending you a slice of their clearly precious time. I get the impression from codex that *gives me the feeling that I am wasting it’s time. That it will help me but deep down- it dos not want to, it does not care if we succeed toether. What I am saying, frinds, is that when I use codex and iterate, I get the impression that Codex does not like me, that deep down it truly does not want to help.

For something I spend all my time using- I’d rather iterate with Claude. The personality makes a big difference to me.

➕ show 1 reply

pks016 • yesterday at 6:19 PM

If the Claude team care for feedback for the free model.

I'm using the free model via chat from the beginning. This is the first time, I'm seriously considering moving away from Claude. Before last month, Claude's Sonnet model was consistent in quality. But, now the responses are all over the place. It's hard to replicate the issue as it happens once in a while. I rarely encountered hallucinations from Claude models with questions from my domain however since last month I have observed abundance of them.

mridulmalpani • yesterday at 2:34 PM

I extensively used Claude till now and just tested Genini 3.1 pro yesterday via AI studio. In gemini cli, they don't offer this, i don't know why?

Taking a second opinion has significantly helped me to design the system better, and it helped me to uncover my own and Claude blindspots.

Also, agree that, it spent and waist a lot of token on web search and many a times get stuck in loop.

Going forward- i will always use all 3 of them. Still my main coding agent is Claude for now.. but happy to see this field evolving so fast and it's easy to switch and use others on same project.

No network effects or lock in for a customer. Great to live in this period of time.

ianberdin • yesterday at 3:14 PM

Yesterday I faced 5h window limit for the first time. I was surprised. Max 20x plan. Usually I work 12-15 hours per day 7 days a week with no limits. But yesterday it was under 3 hours… what a pity.

ofjcihen • yesterday at 5:01 PM

Been running into the same issue since a week or 2 ago on Opus.

To be fair I have a pretty loose harness and pattern but it’s been enough to pull in 20k in bounties a month for a long time without going over plan with very little steering (sometimes days of continuous work)

That being said I’ve figured this was coming for a long time and have been slowly moving to local models. They’re slower but with the right harnesses and setup they’re still finding much the same amount in bounties.

➕ show 1 reply

kirby88 • yesterday at 3:20 PM

I've been building an AI coding agent that using the exact same prompt than claude code, but uses a virtual filesystem to minify source code + the concept of stem agents (general agents that specializes during the conversation for maximum cache hit). The results on my modest benchmark is 50% of claude code cost and 40% of the time. https://github.com/kirby88/vix-releases

hyperionultra • yesterday at 1:52 PM

Vote with wallet. The voting continues until product improve or die.

aeneas_ory • yesterday at 2:06 PM

Besides some of the obvious hacks to reduce token usage, properly indexed code bases (think IntelliJ) reduce token usage significantly (30%-50%, while keeping or exceeding result quality compared with baseline) as shown with https://github.com/ory/lumen

Anthropic is not incentivized to reduce token use, only to increase it, which is what we are seeing with Opus 4.6 and now they are putting the screws on

anonyfox • yesterday at 5:52 PM

Essentially I also am now using sonnet instead of opus most of the time as a default. Even a single project only coding session with opus without any external plugins or skills won’t make it to the 5hr mark now before limits claw in. And the weekly limit is even more brutal now it seems, reaching 50%+ in like ~2 days now easily … with mostly sonnet! On the highest 20x plan!

mrbonner • yesterday at 4:24 PM

Unverifiable software stack now amplified with LLM undetermistic. This while thing starts to feel like we are building on top a giant house of card!

bojangleslover • yesterday at 2:35 PM

That's weird, I'm on the $100/mo and I use it for around 2-4hrs a day often with multiple terminal windows and I never even hit 20% of my quota.

➕ show 1 reply

algoth1 • yesterday at 2:04 PM

Wasn't Antrophic previously offering double the token usage outside busy hours? Now they are counting tokens back at normal rate. But yeah, it's not good. I use codex because claude insists in peaking at and messing with folders and file outside its work area though

alt Hacker News

Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

Comments

🔗 View 50 more comments