logoalt Hacker News

zozbot234yesterday at 11:19 AM1 replyview on HN

> But more importantly, LLMs aren’t stateless across executions, which is a huge no-no.

They are, actually. A "fresh chat" with an LLM is non-deterministic but also stateless. Of course agentic workflows add memory, possibly RAG etc. but that memory is stored somewhere in plain English; you can just go and look at it. It may not be stateless but the state is fully known.


Replies

manuelabeledoyesterday at 11:31 AM

Using the managed runtime analogy, what you are saying is that, if I wanted to benchmark LLMs like I would do with runtimes, I would need to take the delta between versions, plus that between whatever memory they may have. I don’t see how that helps with reproducibility.

Perhaps more importantly, how would I quantify such “memory”? In other words, how could I verify that two memory inputs are the same, and how could I formalize the entirety of such inputs with the same outputs?