logoalt Hacker News

jephstoday at 4:56 PM1 replyview on HN

Scaling curves don't need to be drawn at particularly enormous parameter counts to be useful! If you can do a 300M and 1.2B run (like the authors do here), then you can do 150M, 300M, 600M, and 1.2B runs with only 50% more resources, and get a much better sense for whether effects seem to amplify or diminish as scale increases.


Replies

spindump8930today at 5:44 PM

Exactly. Good peer reviewers understand that you can also move down on the scaling curve, not just up. Also laughable to try a "yolo" run without validating a scaling ladder/curve.