logoalt Hacker News

janalsncmyesterday at 7:38 AM0 repliesview on HN

The bitter-er lesson is that distillation from bigger models works pretty damn well. It’s great news for the GPU poor, not great for the guys training the models we distill from.