> graph-memory research
The problem I have with graph representations relative to LLMs is that you can never directly apply the concept. Everything that speaks to an LLM must ultimately be serialized. There's no getting around the token stream semantics.
I've found that one big flat markdown file tends to outperform everything. You could certainly project an event log into a graph and then serialize that, but it starts to feel like a Rube Goldberg machine at this point. It's a lot easier if you just work with the same terms that the models do.
Remember if that big document rarely changes and everything that comes before it is also constant, you'll pay something like 10% of the normal rate with providers like OAI for the tokens in those documents. The clever schemes to piecemeal out information feel good to the ego and might appeal to accounting at first glance, but I think the bitter lesson will ultimately win out here. We already have a million token context windows. Even if 90% of that is bullshit it's still a lot of tokens to work with.
That's pretty much the architecture I'm using in my personal coding harness Tau (tau-agent.dev) . There are some other points in here, but there are relatively minor. I think the observation that event log / event sourcing / cqrs works perfectly for harnesses is not very novel.
I think this can be safely ignored.
"...and how it extends the BabyAGI lineage and prior graph-memory research. "
From BabyAGI from two years ago: "This is a framework built by Yohei who has never held a job as a developer. The purpose of this repo is to share ideas and spark discussion and for experienced devs to play with. Not meant for production use. Use with cautioun."
Very cool. I settled on the same/similar design in my agent harness.
All relevant events that affect the context window are stored in an event log. Forking agents and sessions is simply setting a pointer to the sequence number of another event log.
So if you want to check an implementation of this pattern see: https://github.com/smartcomputer-ai/lightspeed
This is true after learning this framing.
It's more like the log is the only user/agent accepted consensus. It has to be the grounding base. Although extending it into an agentic system architecture becomes something not necessarily effective in practice.
The paper’s pip library can be tried here
Current text-based LLMs are the same old story - text-based vs graphical UIs that ate them whole for most of humanity:
Chatbot is the command line
Agent is the bash script
___ is the GUI (macOS/Windows/GTA 6)
You need Xerox PARC all over again and we have one
Can someone explain why such a trivial knowhow is paper-worthy? Event sourcing is well known
This paper points at an idea, but its really only legible if you have a more developed version of the idea already. I really should write more
As others have commented, this is an obvious application of event sourcing. It's irritating to see the claim of "deterministic replay" in the abstract along with the caveat "we can't actually do deterministic replay, so we store all of the model's responses and reproject off of that". Sure, ok, whatever. You're doing session recording and calling it replay.
> In this arrangement the log is a byproduct: an audit artifact written alongside the real computation, never the substrate of it.
I’ve come to the same conclusion building my own agents. It simply feels ‘wrong’ that most frameworks will happily mutate your context. You have to explicitly go out of your way to store the original events. I’ve now started storing an event log for my own agents, this is used as the source of truth for deriving all subsequent context.
The great thing about this is that I have finer control over drift in long runs, as I can look back through the conversation/tool history and build context suitable for the current state of the agent. It also allows me to run compactions across the entire event history instead of ‘compactions on top of compactions’ which happens on long runs with checkpoints.
It definitely feels like this will be a bigger issue going forward as we have agents running longer and more complex workflows, I’ve started building a product aimed at addressing this issue in a framework agnostic way. [0]
Arrived at a version of this view as well and building one on Elixir/Ash.
Didn't read the paper yet, but if you have a giant log, I'd guess that's RLMable?
weird why did my same submission 3 days ago not get picked up . how does the algorithm work https://news.ycombinator.com/item?id=48752135
This is one of the most interesting papers I've seen. Someone said it's AI slop, well I sent it to 5.5 Pro and it was a great read.
My log has a message for you.
if the folks at Anthropic/OpenAI can stop their loops for one second they would've figured this out too
but wouldn't feeding that log for each request/response iteration must get expensive really fast no?
also "We discuss--without claiming to demonstrate--" wtf? someone had a showerthought and slopped this out in 10mins to see what others thought?
[flagged]
[flagged]
Very cool work!! This is the same pattern we used at $MY_STARTUP to develop $MY_HARNESS which persists the entire graph to disk, unlike all the other agent harnesses which only store the graph nodes and edges.
Event graphs aren’t just the agentic foundation for $MY_HARNESS — they’re the working cognitive substrate, native to what our favorite toolcall gremlins actually consume.
(Looking for lead investors for our angel syndicate btw! DM me if interested)