CoT is basically bullshit, entirely confabulated and not related to any "thought process"...
But still CoT distillation WORKS. See the DeepSeek R1 paper.
Tokens relate to each other. More tokens more compute
But still CoT distillation WORKS. See the DeepSeek R1 paper.