Didn't this paper demonstrate that you only need 1.58 bits to be equivalent to 16 bits in performance?
https://arxiv.org/abs/2402.17764
Iirc the paper was solid, but it still hasn’t been adopted/proven out at large scale. Harder to adapt hardware and code kernels to something like this compared to int4.
Iirc the paper was solid, but it still hasn’t been adopted/proven out at large scale. Harder to adapt hardware and code kernels to something like this compared to int4.