what's wild is they accidentally solved it — pretraining IS unsupervised learning at scale, RLHF IS reinforcement learning. they just didnt know the recipe yet
pretraining isn't unsupervised, it is self-supervised - meaning it is moderately more scale limited.
pretraining isn't unsupervised, it is self-supervised - meaning it is moderately more scale limited.