That doesn’t make sense to pay more for cache warming. Your session for the most part is already per...

sharts • yesterday at 10:50 PM • 5 replies • view on HN

That doesn’t make sense to pay more for cache warming. Your session for the most part is already persisted. Why would it be reasonable to pay again to continue where you left off at any time in the future?

Replies

jeremyjh • yesterday at 11:22 PM

Because it significantly increases actual costs for Anthropic.

If they ignored this then all users who don’t do this much would have to subsidize the people who do.

➕ show 1 reply

danso • today at 2:15 AM

Genuine question: is the cost to keep a persistent warmed cache for sessions idling for hours/days not significant when done for hundreds of thousands of users? Wouldn’t it pose a resource constraint on Anthropic at some point?

➕ show 2 replies

PeterStuer • today at 4:47 PM

It may be persisted but it is not live in the inference engine.

cadamsdotcom • today at 12:43 AM

Sure, it wouldn’t make sense if they only had one customer to serve :)

uoaei • today at 9:44 AM

Exactly, even in the throes of today's wacky economic tides, storage is still cheap. Write the model state immediately after the N context messages in cache to disk and reload without extra inference on the context tokens themselves. If every customer did this for ~3 conversations per user you still would only need a small fraction of a typical datacenter to house the drives necessary. The bottleneck becomes architecture/topology and the speed of your buses, which are problems that have been contended with for decades now, not inference time on GPUs.

➕ show 1 reply

alt Hacker News

Replies