Newer quantization approaches are even better, 4-bits gets you no meaningful loss relative to FP16: https://github.com/z-lab/paroquant
Hopefully Microsoft keeps pushing BitNet too, so only "1.58" bits are needed.
I think fractional representations are only relevant for training at this point, and bf16 is sufficient, no need for fp4 and such.
Learned rotations for INT4 are cool! Seems similar to SpinQuant? https://arxiv.org/abs/2405.16406
In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.
Not sure why you think fractional representations are only useful for training? Being able to natively compute in lower precisions can be a huge performance boost at inference time.