logoalt Hacker News

rao-vtoday at 6:04 AM0 repliesview on HN

So first - these are terrific papers and I'd not seen some of them before.

Having said that, I don't think these are classic student teacher distillation from random (which was my point). In fact, the "Embarrassingly Simple Self-Distillation" paper is using exactly what I was talking about "fine-tune on those samples with standard supervised fine-tuning".