Such low dimensionality of the LoRA vector must surely result in a close-to-linear modification to t...

matt123456789 • today at 3:54 AM • 1 reply • view on HN

Such low dimensionality of the LoRA vector must surely result in a close-to-linear modification to the KV calculation. This seems to me to imply that what we call "reasoning" is latent within the model. Pretty clear I didn't read the paper, I'm sure the authors address this.

Replies

a-t-c-g • today at 3:58 AM

Yes - some degree of reasoning appears to be latent in the structure of language itself. But models trained explicitly on reasoning-focused data still perform better than models trained only on general corpora.*

*At least up to 300B parameters, based on the models we’ve tested.

alt Hacker News

Replies