logoalt Hacker News

buildbotyesterday at 3:28 PM1 replyview on HN

Learned rotations for INT4 are cool! Seems similar to SpinQuant? https://arxiv.org/abs/2405.16406

In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.

Not sure why you think fractional representations are only useful for training? Being able to natively compute in lower precisions can be a huge performance boost at inference time.


Replies

naaskingyesterday at 4:47 PM

> Learned rotations for INT4 are cool! Seems similar to SpinQuant? https://arxiv.org/abs/2405.16406

Indeed, but much better! More accurate, less time and space overhead, beats AWQ on almost every bench. I hope it becomes the standard.

> In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.

I hope you're wrong! I'm more optimistic. Definitely a bit more work to be done, but still very promising.

> Being able to natively compute in lower precisions can be a huge performance boost at inference time.

ParoQuant is barely worse than FP16. Any less precise fractional representation is going to be worse than just using that IMO.