It's strange that they don't include reasoning training (RLVR). Their justification doesn&...

cubefox • yesterday at 1:14 PM • 1 reply • view on HN

It's strange that they don't include reasoning training (RLVR). Their justification doesn't sound convincing:

> While reasoning models have grown in popularity in recent years, their abilities aren’t always the most efficient way to get a result. In enterprise settings, token costs and speed are often as important as performance. That is why turning to less expensive, non-reasoning models with similar benchmark performance for select tasks like instruction following and tool calling makes sense for enterprise users.

I guess they currently don't have the ability to do proper RLVR.

Replies

mdp2021 • today at 11:04 AM

I may have misunderstood: is not reasoning training (RLVR) independent from the use of the "<think>" tags - is it not a method that improves results in reasoning? How do we know that it was not carried out?

Incidentally: I am trying to spend some time researching in the progresses in the area (the jump from parroting, to inconsistent apparent reasoning, to reliable reasoning).

alt Hacker News

Replies