That's pretty nice actually, how much KV cache does that model require at full context? That te...

zozbot234 • today at 10:51 AM • 0 replies • view on HN

That's pretty nice actually, how much KV cache does that model require at full context? That tends to be the main limit to running concurrent requests locally, there's KV quantization but it has outsized negative impact on model quality.

alt Hacker News