I'll take this opportunity to repeat that the natural language interpretation of thinking traces don't appear to be "real" by any reasonable definition. Even if they can at times be useful (at least seemingly). There's research demonstrating the usage of arbitrary symbols, even just repeating a single symbol, leading to a similar improvement in ability. This makes sense if you consider how the attention mechanism and KV cache work as the sequence iteratively grows.
Basically we optimize the models to produce output with certain characteristics but that doesn't mean that what we see is the whole truth or even that the relationships in the underlying system are structured in the way that we might expect.