logoalt Hacker News

girvoyesterday at 10:47 PM1 replyview on HN

> I suspect nobody is doing real student teacher distillation

It gets used for quantisation, basically recovering accuracy for lower quants (Nvidia calls it QAD). Can’t speak to how widespread it is though


Replies

rao-vyesterday at 11:08 PM

Yes absolutely! I should have been more specific - I don’t believe people are using it to train 30B models from 300B models (and I’d love to learn that I’m off about this)