They probably already do that. But these caches can get pretty big (10s of GBs per session), so that...

vanviegen • today at 8:49 AM • 1 reply • view on HN

They probably already do that. But these caches can get pretty big (10s of GBs per session), so that adds up fast, even for cold storage.

Replies

kovek • today at 6:21 PM

10s of GBs? ( 1,000,000 context * 1,000 vector size ) ^ 2 = 1,000,000,000,000,000,000… oh wow.. I must be miscalculating

What about only storing the conversation and then recomputing the embeddings in the cache? Does that cost a lot? Doing a lot of matrix multiplication does not cost dollars of compute, especially on specialized hardware, right?

➕ show 1 reply

alt Hacker News

Replies