The point of prompt caching is to save on prefill which for large contexts (common for agentic workl...

zozbot234 • last Sunday at 12:12 AM • 0 replies • view on HN

The point of prompt caching is to save on prefill which for large contexts (common for agentic workloads) is quite expensive per token. So there is a context length where storing that KV-cache is worth it, because loading it back in is more efficient than recomputing it. For larger SOTA models, the KV cache unit size is also much smaller compared to the compute cost of prefill, so caching becomes worthwhile even for smaller context.

alt Hacker News