I disagree with this. Reinforcement learning with verifiable rewards training is actually the secret...

LarsDu88 • yesterday at 6:56 PM • 2 replies • view on HN

I disagree with this. Reinforcement learning with verifiable rewards training is actually the secret sauce that is leading Claude and GPT to automating software engineering tasks.

All the easily verifiable domains such as mathematics, coding, and things that can be run inside a reasonable simulation are falling very very fast.

By next year if not sooner, mathematicians will be wildly outpaced by LLMs for reasoning.

Replies

Alex_L_Wood • yesterday at 8:53 PM

Coding is anything but “easily” verifiable.

➕ show 1 reply

alt Hacker News

Replies