It'd be terribly compute inefficient to not share prefix caches (KV cache) across customers.

woadwarrior01 • today at 3:12 PM • 1 reply • view on HN

Replies

What is the probability that two customers will have exactly the same tokens in cache? Wouldnt it require using the exact same CLAUDE.md, skills, MCPs and context? After that it is even worse since the nondeterminism of LLMs and humans

➕ show 3 replies

alt Hacker News

Replies