Inference at low bit-widths is easy. Training is where the wheels come off, because you spend the sa...

hrmtst93837 • today at 9:17 AM • 1 reply • view on HN

Inference at low bit-widths is easy. Training is where the wheels come off, because you spend the saved math budget on gradient tricks and rescaling just to stop the model from drifting.

That trade loses outside tight edge deploymints. Float formats stuck around for boring reasons: they handle ugly value ranges and they fit the GPU stack people already own.

Replies

guerrilla • today at 5:35 PM

Well this is perfect then. We just post-process models like this after training.

alt Hacker News

Replies