Looks very interesting. Can you comment on why you think this model can give comparable performance ...

whinvik • today at 12:09 PM • 1 reply • view on HN

Looks very interesting. Can you comment on why you think this model can give comparable performance with less training data?

Replies

adebayoj • today at 1:36 PM

We train the model with `explanations`. Most training asks the model to predict the next token or group of tokens. Our training says, predict the next group of tokens (causal diffusion), but also these tokens should be about {sports/art/coding/etc}. So in addition to token supervision, the model gets concept level supervision. The model is forced to more quickly learn these high level concepts.

alt Hacker News

Replies