logoalt Hacker News

nbardytoday at 9:57 AM0 repliesview on HN

Confidently yes. OpenAI for sure has been training larger models internally and distilling.

Pre-training scaling laws all support larger models being more cost effeceint to train then smaller models. And distillation is comparably cheap. So you can get the most juice by training the biggest model you can and distilling it.