Related question, is it at all feasible to store cache locally to offload memory costs and then send...

tmountain • yesterday at 9:23 AM • 1 reply • view on HN

Related question, is it at all feasible to store cache locally to offload memory costs and then send it over the wire when needed?

Replies

dev_hugepages • yesterday at 11:45 AM

No, the cache is a few GB large for most usual context sizes. It depends on model architecture, but if you take Gemma 4 31B at 256K context length, it takes 11.6GB of cache

note: I picked the values from a blog and they may be innacurate, but in pretty much all model the KV cache is very large, it's probably even larger in Claude.

➕ show 2 replies

alt Hacker News

Replies