logoalt Hacker News

buildbotlast Wednesday at 6:06 PM1 replyview on HN

You can even train in 4 & 8 bits with newer microscaled formats! From https://arxiv.org/pdf/2310.10537 to gpt-oss being trained (partially) natively in MXFP4 - https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-...

To Nemotron 3 Super, which had 25T of nvfp4 native pretraining! https://docs.nvidia.com/nemotron/0.1.0/nemotron/super3/pretr...


Replies

naaskingyesterday at 2:32 PM

Newer quantization approaches are even better, 4-bits gets you no meaningful loss relative to FP16: https://github.com/z-lab/paroquant

Hopefully Microsoft keeps pushing BitNet too, so only "1.58" bits are needed.

I think fractional representations are only relevant for training at this point, and bf16 is sufficient, no need for fp4 and such.

show 1 reply