logoalt Hacker News

shartsyesterday at 10:50 PM5 repliesview on HN

That doesn’t make sense to pay more for cache warming. Your session for the most part is already persisted. Why would it be reasonable to pay again to continue where you left off at any time in the future?


Replies

jeremyjhyesterday at 11:22 PM

Because it significantly increases actual costs for Anthropic.

If they ignored this then all users who don’t do this much would have to subsidize the people who do.

show 1 reply
dansotoday at 2:15 AM

Genuine question: is the cost to keep a persistent warmed cache for sessions idling for hours/days not significant when done for hundreds of thousands of users? Wouldn’t it pose a resource constraint on Anthropic at some point?

show 2 replies
PeterStuertoday at 4:47 PM

It may be persisted but it is not live in the inference engine.

cadamsdotcomtoday at 12:43 AM

Sure, it wouldn’t make sense if they only had one customer to serve :)

uoaeitoday at 9:44 AM

Exactly, even in the throes of today's wacky economic tides, storage is still cheap. Write the model state immediately after the N context messages in cache to disk and reload without extra inference on the context tokens themselves. If every customer did this for ~3 conversations per user you still would only need a small fraction of a typical datacenter to house the drives necessary. The bottleneck becomes architecture/topology and the speed of your buses, which are problems that have been contended with for decades now, not inference time on GPUs.

show 1 reply