AlphaZero worked because chess and Go have terminal rewards and positions you can prove are right or...

7777777phil • today at 6:08 AM • 0 replies • view on HN

AlphaZero worked because chess and Go have terminal rewards and positions you can prove are right or wrong. General intelligence has neither, and the leap from self-play in a well-defined game to self-play in arbitrary environments is the hard part Silver isn't really demoing. Sara Hooker's stuff on scaling laws lines up here (1)

(1) https://philippdubach.com/posts/the-most-expensive-assumptio...

alt Hacker News