logoalt Hacker News

vanviegentoday at 8:49 AM1 replyview on HN

They probably already do that. But these caches can get pretty big (10s of GBs per session), so that adds up fast, even for cold storage.


Replies

kovektoday at 6:21 PM

10s of GBs? ( 1,000,000 context * 1,000 vector size ) ^ 2 = 1,000,000,000,000,000,000… oh wow.. I must be miscalculating

What about only storing the conversation and then recomputing the embeddings in the cache? Does that cost a lot? Doing a lot of matrix multiplication does not cost dollars of compute, especially on specialized hardware, right?

show 1 reply