Yes, they can. Some people like to parrot "next token prediction", "LLMs can only i...

charleshn • today at 10:58 AM • 2 replies • view on HN

Yes, they can.

Some people like to parrot "next token prediction", "LLMs can only interpolate", and other nonsense, but it is obviously not true for many reasons, in particular since we introduced RL.

Humans do not have the monopoly on generating novel ideas, modern AI models using post training, RL etc can come to them in the same way we do, exploration.

See also verifier's law [0]: "The ease of training AI to solve a task is proportional to how verifiable the task is. All tasks that are possible to solve and easy to verify will be solved by AI."

This applied to chess, go, strategy games, and we can now see it applying to mathematics, algorithmic problems, etc.

It is incredibly humbling to see AI outperform humans at creative cognitive tasks, and realise that the bitter lesson [1] applies so generally, but here we are.

[0] https://www.jasonwei.net/blog/asymmetry-of-verification-and-...

[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Replies

energy123 • today at 2:06 PM

> but it is obviously not true for many reasons, in particular since we introduced RL.

RL or no RL, AI cannot escape the distribution it's trained on. It's just that the labs will put so much into the distribution that we won't be able to tell the difference that easily, nor will it matter for most tasks. I will probably call it AGI/ASI like everyone else when it arrives, even though it will be structurally incapable of reasoning out of distribution. There isn't a need to appeal to magic to explain OP. Just a very, very big distribution.

jdub • today at 12:21 PM

Reinforcement learning for "reasoning" perturbs the model to generate completions in a particular chain of thought / alternative selection structure. It's three next token predictors in a trench coat.

➕ show 1 reply

alt Hacker News

Replies