Scaling curves don't need to be drawn at particularly enormous parameter counts to be useful! I...

jephs • today at 4:56 PM • 1 reply • view on HN

Scaling curves don't need to be drawn at particularly enormous parameter counts to be useful! If you can do a 300M and 1.2B run (like the authors do here), then you can do 150M, 300M, 600M, and 1.2B runs with only 50% more resources, and get a much better sense for whether effects seem to amplify or diminish as scale increases.

Replies

spindump8930 • today at 5:44 PM

Exactly. Good peer reviewers understand that you can also move down on the scaling curve, not just up. Also laughable to try a "yolo" run without validating a scaling ladder/curve.

alt Hacker News

Replies