TurboQuant helps KV quantization which is not very relevant to local LLMs, since context size become...

zozbot234 • yesterday at 8:15 PM • 0 replies • view on HN

TurboQuant helps KV quantization which is not very relevant to local LLMs, since context size becomes most relevant when you run inference with large batches. For small-scale inference, weights dominate. (Even if you stream weights from SSD, you'll want to cache a sizeable fraction to get workable throughput, and that dominates your memory usage.)

alt Hacker News