I use the chess example because it’s especially instructive. It would NOT be trivial to train an LLM...

sulam • yesterday at 3:30 PM • 1 reply • view on HN

I use the chess example because it’s especially instructive. It would NOT be trivial to train an LLM to play chess, next token prediction breaks down when you have so many positions to remember and you can’t adequately assign value to intermediate positions. Chess bots work by being trained on how to assign value to a position, something fundamentally different than what an LLM is doing.

A simpler example — without tool use, the standard BPE tokenization method made it impossible for state of the art LLMs to tell you how many ‘r’s are in strawberry. This is because they are thinking in tokens, not letters and not words. Can you think of anything in our intelligence where the way we encode experience makes it impossible for us to reason about it? The closest thing I can come to is how some cultures/languages have different ways of describing color and as a result cannot distinguish between colors that we think are quite distinct. And yet I can explain that, think about it, etc. We can reason abstractly and we don’t have to resort to a literal deus ex machina to do so.

Not being able to explain our brain to you doesn’t mean I can’t notice things that LLMs can’t do, and that we can, and draw some conclusions.

Replies

pu_pe • yesterday at 5:46 PM

There are chess engines based on transformers, even DeepMind released one [1]. It achieved ~2900 Elo. It does have peculiarities for example in the endgame that are likely derived from its architecture, though I think it definitely qualifies as an example of the fact that simply because something is a next token predictor doesn't mean it cannot perform tasks that require intelligence and planning.

The r in strawberry is more of a fundamental limitation of our tokenization procedures, not the transformer architecture. We could easily train a LLM with byte-size tokens that would nail those problems. It can also be easily fixed with harnessing (ie for this class of problems, write a script rather than solve it yourself). I mean, we do this all the time ourselves, even mathematicians and physicists will run to a calculator for all kinds of problems they could in principle solve in their heads.

[1] https://arxiv.org/abs/2402.04494

alt Hacker News

Replies