logoalt Hacker News

yumrajyesterday at 12:09 AM1 replyview on HN

Where are you getting 60GB from? It shouldn’t be that large.

But yes, would love to save context/cache such that it can be played back/referred to if needed.

/compact is a little black box that I just have to trust that is keeping the important bits.


Replies

davmreyesterday at 2:07 AM

The KV cache consists of activation vectors for every attention head at every layer of the model for every token, so it gets quite large. ChatGPT also estimates 60-100GB for full token context of an Opus-sized model:

https://chatgpt.com/share/69dc5030-268c-83e8-92c2-6cef962dc5...

show 2 replies