I may have misunderstood: is not reasoning training (RLVR) independent from the use of the "<...

mdp2021 • today at 11:04 AM • 0 replies • view on HN

I may have misunderstood: is not reasoning training (RLVR) independent from the use of the "<think>" tags - is it not a method that improves results in reasoning? How do we know that it was not carried out?

Incidentally: I am trying to spend some time researching in the progresses in the area (the jump from parroting, to inconsistent apparent reasoning, to reliable reasoning).

alt Hacker News