LLMs are stateless, to predict next tokens they need the history. When you write your own agents you...

isbvhodnvemrwvn • yesterday at 8:33 PM • 2 replies • view on HN

LLMs are stateless, to predict next tokens they need the history. When you write your own agents you will be very selective and might trim context and heavily segment different tasks, but generic ones don't do that (at best they spawn subjects to handle smaller tasks)

Replies

lxgr • yesterday at 10:36 PM

That said, the KV cache is very much not stateless, so internally inference APIs will be highly incentivized to route requests to instances with as much a shared prefix cached as possible.

rkagerer • today at 9:56 AM

Thanks. If I ran it local, presumably I could keep the state cached forever. Can you "reserve" resources from a frontier provider to guarantee your state stays "hot"? (Analogous to reserving a whole VM instead of a slice)

➕ show 1 reply

alt Hacker News

Replies