logoalt Hacker News

whiplash451today at 4:42 AM0 repliesview on HN

Maybe because distilling small models from bigger ones that you control gives you better small models than fine-tuning from bigger models you don't control?

(I am not claiming it is the case, but stating this as an assumption)