I may have misunderstood: is not reasoning training (RLVR) independent from the use of the "<think>" tags - is it not a method that improves results in reasoning? How do we know that it was not carried out?
Incidentally: I am trying to spend some time researching in the progresses in the area (the jump from parroting, to inconsistent apparent reasoning, to reliable reasoning).