TurboQuant has a specific benefit by compressing the KV cache at a negligible cost to quality. That ...

scw • yesterday at 2:09 PM • 1 reply • view on HN

TurboQuant has a specific benefit by compressing the KV cache at a negligible cost to quality. That mainly means that the context lengths can go up in models for the same amount of memory, however the KV cache only accounts for something like 20% of the overall model size, and this will not dramatically decrease memory demands in the way that some of the more sensationalist reporting has stated.

Replies

lostmsu • yesterday at 2:53 PM

In large providers KV caches are the main bottleneck, no?

alt Hacker News

Replies