logoalt Hacker News

woadwarrior01today at 3:12 PM1 replyview on HN

It'd be terribly compute inefficient to not share prefix caches (KV cache) across customers.


Replies

acepltoday at 3:21 PM

What is the probability that two customers will have exactly the same tokens in cache? Wouldnt it require using the exact same CLAUDE.md, skills, MCPs and context? After that it is even worse since the nondeterminism of LLMs and humans

show 3 replies