You talk as if problem solving is a supervised (imitation) learning problem. No, it is a reinforceme...

visarga • yesterday at 5:56 PM • 0 replies • view on HN

You talk as if problem solving is a supervised (imitation) learning problem. No, it is a reinforcement learning problem, models learn by solving problems and getting rated. They generate their own training data. Optimal budget allocation is 1/3 cost pre-training, 1/3 for RL, and 1/3 on inference.

alt Hacker News