Also, there is zero reason to think that the big labs did not have anything similar to TurboQuant fo...

fotcorn • yesterday at 1:24 PM • 2 replies • view on HN

Also, there is zero reason to think that the big labs did not have anything similar to TurboQuant for a long time already.

The recent blog post from Google announcing TurboQuant does not change anything regarding RAM planning for the big labs.

TurboQuant itself is already a year old! So even smaller labs have probably seen and implemented it.

Replies

scw • yesterday at 2:09 PM

TurboQuant has a specific benefit by compressing the KV cache at a negligible cost to quality. That mainly means that the context lengths can go up in models for the same amount of memory, however the KV cache only accounts for something like 20% of the overall model size, and this will not dramatically decrease memory demands in the way that some of the more sensationalist reporting has stated.

➕ show 1 reply

schmidtleonard • yesterday at 1:37 PM

The open source tooling got quantization support 3 years ago! It was a lesser type of quantization, but more than enough to prove that the savings just go to bigger models.

alt Hacker News

Replies