This is because the "thinking" you see is a summary by a highly quantized model - not the actual model, to mask these tokens