As others have commented, this is an obvious application of event sourcing. It's irritating to see the claim of "deterministic replay" in the abstract along with the caveat "we can't actually do deterministic replay, so we store all of the model's responses and reproject off of that". Sure, ok, whatever. You're doing session recording and calling it replay.
Agree with your critique. I think this work is presenting common ideas as novel without thinking through existing problems. Defining a provider-agnostic event graph that enables full session branching replay was the whole point of pi: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/ , though the language around it perhaps didn’t click until a bit later. I don’t even think pi was the first to do this.
Another critique: the abstract mentions how their system allows for “branch[ing] a run at any event without re-executing the shared prefix,” but that’s only possible with very careful KV caching. Generally, rerunning inference from an earlier point still incurs O(n) input token cost and this paper is working at the wrong layer to see that. In this work, execution refers to tool calls but token generation is the expensive part.