logoalt Hacker News

LarsDu88yesterday at 6:56 PM2 repliesview on HN

I disagree with this. Reinforcement learning with verifiable rewards training is actually the secret sauce that is leading Claude and GPT to automating software engineering tasks.

All the easily verifiable domains such as mathematics, coding, and things that can be run inside a reasonable simulation are falling very very fast.

By next year if not sooner, mathematicians will be wildly outpaced by LLMs for reasoning.


Replies

Alex_L_Woodyesterday at 8:53 PM

Coding is anything but “easily” verifiable.

show 1 reply