logoalt Hacker News

mike_hearntoday at 10:30 AM1 replyview on HN

Cache definitely isn't free! We're in a global RAM shortage and KV caches sit around consuming RAM in the hope that there will be a hit.

The gamble with caching is to hold a KV cache in the hope that the user will (a) submit a prompt that can use it and (b) that will get routed to the right server which (c) won't be so busy at the time it can't handle the request. KV caches aren't small so if you lose that bet you've lost money (basically, the opportunity cost of using that RAM for something else).


Replies

otterleytoday at 1:06 PM

Why do you believe that caches are held in RAM? They don’t need RAM performance, and disk is orders of magnitude cheaper.

show 1 reply