Seems to me if the model and the kv cache are competing for the same pool of memory, then massively ...

dd8601fn • yesterday at 2:10 PM • 1 reply • view on HN

Seems to me if the model and the kv cache are competing for the same pool of memory, then massively compressing the cache necessarily means more ram available for (if it fits) a larger model, no?

Replies

delecti • yesterday at 2:22 PM

Yes, but the context is a comparatively smaller part of how much memory is used when running it locally for a single user, vs when running it on a server for public... serving.

alt Hacker News

Replies