Does the KV cache really grow to use more memory than the model weights? The reduction in overall RA...

barbegal • today at 1:13 PM • 1 reply • view on HN

Does the KV cache really grow to use more memory than the model weights? The reduction in overall RAM relies on the KV cache being a substantial proportion of the memory usage but with very large models I can't see how that holds true.

Replies

zozbot234 • today at 2:57 PM

For long context, yes this is at least plausible. And the latest models are reaching context lengths of 1M tokens or perhaps more.

alt Hacker News

Replies