logoalt Hacker News

krackersyesterday at 10:03 PM2 repliesview on HN

>pay for reinitializing the cache

Why can't they save the kv cache to disk then later reload it to memory?


Replies

stavrosyesterday at 11:57 PM

Probably because the costly operation is loading it onto the GPU, doesn't matter if it's from disk or from your request.

show 1 reply
stingraycharlestoday at 2:04 AM

It’s a shitload of data, and it only works if all the tokens are 100% identical, i.e. all the attention values are exactly the same.

Typically it’s cached for about 5 minutes, you can pay extra for longer caches.

show 1 reply