Claude Code is the most prompt cache-efficient harness, I think. The issue is more that the larger the context window, the higher the cost of a cache miss.
Politely, no.
- I wrote an extension in Pi to warm my cache with a heartbeat.
- I wrote another to block submission after the cache expired (heartbeats disabled or run out)
- I wrote a third to hard limit my context window.
- I wrote a fourth to handle cache control placement before forking context for fan out.
- my initial prompt was 1000 tokens, improving cache efficiency.
Anthropic is STOMPING on the diversity of use cases of their universal tool, see you when you recover.
That might be, but the argument was that poor cache utilization was costing Anthropic too much money in other harnesses. If cache is considered in rate limits, it doesn’t matter from a cost perspective, you’ll just hit your rate limits faster in other harnesses that don’t try to cache optimize.
I’m sorry but when you wake up in the morning with 12% of your session used, saying “it’s the cache” is not an appropriate answer.
And I’m using Claude on a small module in my project, the automations that read more to take up more context are a scam.
I do wonder if it's fair to expect users to absorb cache miss costs when using Claude Code given how untransparent these are.