> Learned rotations for INT4 are cool! Seems similar to SpinQuant? https://arxiv.org/abs/2405.16406
Indeed, but much better! More accurate, less time and space overhead, beats AWQ on almost every bench. I hope it becomes the standard.
> In my personal opinion I don’t think the 1.58 bit work is going to make it into the mainstream.
I hope you're wrong! I'm more optimistic. Definitely a bit more work to be done, but still very promising.
> Being able to natively compute in lower precisions can be a huge performance boost at inference time.
ParoQuant is barely worse than FP16. Any less precise fractional representation is going to be worse than just using that IMO.