logoalt Hacker News

TeMPOraLyesterday at 1:47 PM1 replyview on HN

DeepSeek R1 was a famous case - not only it briefly beat then-SOTA on the cheap, it was also released with distilled versions that preserved bulk of the improvements but could be run on higher-end consumer hardware.

And of course Gemma models are said to be distillations of Gemini.


Replies

epolanskiyesterday at 2:06 PM

The distillation you're talking about is about cutting the number of weights, it has nothing to do with extracting QAs from another model.