Unsupervised is basically clustering. Alphago is RL - winning or losing a game is a form of supervi...

jmalicki • today at 6:05 PM • 0 replies • view on HN

Unsupervised is basically clustering. Alphago is RL - winning or losing a game is a form of supervision.

Unsupervised is something where there is no intrinsic reward signal. In pre training, predicting the next token and seeing that it matches is a reward signal, hence it is self supervised.

alt Hacker News