logoalt Hacker News

Otterly99yesterday at 1:14 PM2 repliesview on HN

But chess models aren't trained the same way LLMs are trained. If I am not mistaken, they are trained directly from chess moves using pure reinforcement learning, and it's definitely not trivial as for instance AlphaZero took 64 TPUs to train.


Replies

ACCount37yesterday at 2:27 PM

You can train them in a very similar way.

Modern LLMs often start at "imitation learning" pre-training on web-scale data and continue with RLVR for specific verifiable tasks like coding. You can pre-train a chess engine transformer on human or engine chess parties, "imitation learning" mode, and then add RL against other engines or as self-play - to anneal the deficiencies and improve performance.

This was used for a few different game engines in practice. Probably not worth it for chess unless you explicitly want humanlike moves, but games with wider state and things like incomplete information benefit from the early "imitation learning" regime getting them into the envelope fast.

pu_peyesterday at 5:47 PM

I meant trivial in the sense it's a solved problem, I'm sure it still costs a non-negligible amount of money to train it. See for example the chess transformer built by DeepMind a couple of years ago which I referred to in a sibling comment [1].

[1] https://arxiv.org/abs/2402.04494