This has not been my experience with Opus since Anthropic released the 1M token context window for use under the subscription plans. I routinely push past 500k tokens, even sometimes up to around 800k tokens, and don't see this problem. I've seen it to some extent when getting truly near the limit, up around and above 900k tokens, though what I see isn't as severe as the author seems to see.
(And I rarely fill the context window that far anyway when working on a single task, or a series of tasks that are related enough to warrant the same context; more typical is anywhere between 200k and 600k or so.)
I'm not saying that no one ever has this experience, but it's odd to me that some people see it so often that it warrants giving it a name.
Opus 4.6 was on drugs past 200k, I skipped 4.7, 4.8 did good up to ~350k, and Fable did great beyond 400k, in my limited testing. The quality does appear to be trending upwards.
agreed. the claudes have been getting better and better with every release in this regard.
opus 4.5 would start failing tool calls when approaching its 200k limit, opus 4.6 could get to ~300k before getting confused, opus 4.7 i could stretch to around 400k the dumb zone started, with opus 4.8 i've had sessions get over 500k comfortably.
admittedly we only had limited time with fable, but i had a couple sessions get into 800-900k just fine.
I often push past 300k or so and I’ve absolutely worked at 800k but it’s an observable problem. Large context windows can work depending on the problem but I do feel more effective biasing towards small ones <300k.
Thats another problem of this post, the author mentions Claude but not explicitely what models...
100k tokens "by lunch" is also not my finding, the newer models will hit that already right in the initial exploratory phase
I’ve had similar experiences with Fable. 70%+ context used out of 1M, still sharp and no memory issues.
I have a custom build command for a rust project (yarn build:lib) and my experience is 120k for GLM and roughly 200-300k for Opus. After that, they default to cargo build.
As the gamblers say at the poker table: If you can't figure out who the mark is when you site down...
I see this said often and find it insane given how many times I find opus models making basic recall mistakes at <100k tokens.
Personally I consider < 60k to be the smart zone for opus. This is worse for opus 4.7 and 4.8 cause of the more granular tokenizer