My favorite conspiracy explanation:
Claude has gotten a lot of popular media attention in the last few weeks, and the influx of users is constraining compute/memory on an already compute heavy model. So you get all the suspected "tricks" like quantization, shorter thinking, KV cache optimizations.
It feels like the same thing that happened to Gemini 3, and what you can even feel throughout the day (the models seem smartest at 12am).
Dario in his interview with dwarkesh last week also lamented the same refrain that other lab leaders have: compute is constrained and there are big tradeoffs in how you allocate it. It feels safe to reason then that they will use any trick they can to free up compute.