> I am confused what actually happens in the vectorized ADD and MULT instructions in the GPU with...

est • today at 7:12 AM • 0 replies • view on HN

> I am confused what actually happens in the vectorized ADD and MULT instructions in the GPU with these quantized numbers.

I might be wrong, but I think LLM is all about comparing distance between tokens. You can tell that -255 and +255 are very separated, but you are also away that -8 and +8 are also very far away.

Microsoft Bitnet and Google TurboQuant shows that in extreme you can use just -1, 0, +1

alt Hacker News