Newer quantization approaches are even better, 4-bits gets you no meaningful loss relative to FP16: ...

naasking • yesterday at 2:32 PM • 1 reply • view on HN

Newer quantization approaches are even better, 4-bits gets you no meaningful loss relative to FP16: https://github.com/z-lab/paroquant

Hopefully Microsoft keeps pushing BitNet too, so only "1.58" bits are needed.

I think fractional representations are only relevant for training at this point, and bf16 is sufficient, no need for fp4 and such.

Replies

buildbot • yesterday at 3:28 PM

Learned rotations for INT4 are cool! Seems similar to SpinQuant? https://arxiv.org/abs/2405.16406

In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.

Not sure why you think fractional representations are only useful for training? Being able to natively compute in lower precisions can be a huge performance boost at inference time.

➕ show 1 reply

alt Hacker News

Replies