We should probably only interact with the agent by writing to the log, which it executes from, and the agent should probably only interact with the external environment by writing and executing code. That fixes a lot of issues with non-determinism.
What if a tool produced an error and a retry? Is retry loop now a part of the log?
Agreed. While not directly applicable, I was a huge fan of Mozilla's rr [0] in undergrad. Quoting their site:
> rr records a group of Linux user-space processes and captures all inputs to those processes from the kernel, plus any nondeterministic CPU effects performed by those processes (of which there are very few).
I think the solution will resemble that. You don't control the LLM, sure. But you can control what it sees, and maybe that's good enough.
[0] https://rr-project.org/