Where are you getting 60GB from? It shouldn’t be that large. But yes, would love to save context&#...

yumraj • yesterday at 12:09 AM • 1 reply • view on HN

Where are you getting 60GB from? It shouldn’t be that large.

But yes, would love to save context/cache such that it can be played back/referred to if needed.

/compact is a little black box that I just have to trust that is keeping the important bits.

Replies

The KV cache consists of activation vectors for every attention head at every layer of the model for every token, so it gets quite large. ChatGPT also estimates 60-100GB for full token context of an Opus-sized model:

https://chatgpt.com/share/69dc5030-268c-83e8-92c2-6cef962dc5...

➕ show 2 replies

alt Hacker News

Replies