logoalt Hacker News

esttoday at 7:12 AM0 repliesview on HN

> I am confused what actually happens in the vectorized ADD and MULT instructions in the GPU with these quantized numbers.

I might be wrong, but I think LLM is all about comparing distance between tokens. You can tell that -255 and +255 are very separated, but you are also away that -8 and +8 are also very far away.

Microsoft Bitnet and Google TurboQuant shows that in extreme you can use just -1, 0, +1