How much did this pretraining run cost? I am impressed that it is now practical to do such efforts.
Let me try a guess for the cost; please fact-check it if you can.
They indicate using 10^22 FLOPs. A $5/h[0] EC2 H100 (1671 bfloat16 teraFLOPS[0]) instance will produce 830 TFLOPS at 50% MFU. The pretraining run thus costs (10^22/830e12)/3600*5 = $17K.