logoalt Hacker News

whinviktoday at 12:09 PM1 replyview on HN

Looks very interesting. Can you comment on why you think this model can give comparable performance with less training data?


Replies

adebayojtoday at 1:36 PM

We train the model with `explanations`. Most training asks the model to predict the next token or group of tokens. Our training says, predict the next group of tokens (causal diffusion), but also these tokens should be about {sports/art/coding/etc}. So in addition to token supervision, the model gets concept level supervision. The model is forced to more quickly learn these high level concepts.