I think it's just routing to faster hardware:
H100 SXM: 3.35 TB/s HBM3
GB200: 8 TB/s HBM3e
2.4x faster memory - which is exactly what they are saying the speedup is. I suspect they are just routing to GB200 (or TPU etc equivalents).
FWIW I did notice _sometimes_ recently Opus was very fast. I put it down to a bug in Claude Code's token counting, but perhaps it was actually just occasionally getting routed to GB200s.
Dylan Patel did analysis that suggests lower batch size and more speculative decoding leads to 2.5x more per-user throughput for 6x the cost for open models [0]. Seems plausible this could be what they are doing. We probably won't get to know for sure any time soon.
Regardless, they don't need to be using new hardware to get speedups like this. It's possible you just hit A/B testing and not newer hardware. I'd be surprised if they were using their latest hardware for inference tbh.
[0] https://nitter.net/dylan522p/status/2020302299827171430