Confidently yes. OpenAI for sure has been training larger models internally and distilling. Pre-tr...

nbardy • today at 9:57 AM • 0 replies • view on HN

Confidently yes. OpenAI for sure has been training larger models internally and distilling.

Pre-training scaling laws all support larger models being more cost effeceint to train then smaller models. And distillation is comparably cheap. So you can get the most juice by training the biggest model you can and distilling it.

alt Hacker News