Is the RLVR the key breakthrough for the uplift or is there more to it? Does that suggest the upli...

ex-aws-dude • today at 4:30 AM • 3 replies • view on HN

Is the RLVR the key breakthrough for the uplift or is there more to it?

Does that suggest the uplift was only for things that are easily verifiable like code?

Replies

Yes, with good RLVR at scale you can greatly improve performance especially on benchmarks

The hope was that good RLVR on relatively contrived datasets (like benchmarks) would be generalized to good software taste, which has somewhat succeeded but also the models fail in horrible ways still

And the hope beyond that is that good skills in fundamental problem solving tasks (coding, math) would generalize to tasks beyond math and code, which did happen but less so

rdedev • today at 4:54 AM

I would say that most improvements are in easily verifiable things like code or math. Atleast that's where all the amazing results seem to be coming from.

Other domains I am not sure but I've heard from people like Cal Newport that the rate of increase outside of code and math are not as equally impressive

4b11b4 • today at 4:32 AM

RL we're gonna find out will get abandoned cuz we don't even know what is getting "aligned", just my naive gut feeling don't take it seriously

alt Hacker News

Replies