I've been reviewing Agent sandboxing solutions recently and it occurred to me there is a gaping vector for persistent exploits for tools that let the agent write to the project directory. Like this one does.
I had originally thought this would ok as we could review everything in the git diff. But, it later occurred to me that there are all kinds of files that the agent could write to that I'd end up executing, as the developer, outside the sandbox. Every .pyc file for instance, files in .venv , .git hook files.
ChatGPT[1] confirms the underlying exploit vectors and also that there isn't much discussion of them in the context of agent sandboxing tools.
My conclusion from that is the only truly safe sandboxing technique would be one that transfers files from the sandbox to the dev's machine through some kind of git patch or similar. I.e. the file can only transfer if it's in version control and, therefore presumably, has been reviewed by the dev before transfer outside the sandbox.
I'd really like to see people talking more about this. The solution isn't that hard, keep CWD as an overlay and transfer in-container modified files through a proxy of some kind that filters out any file not in git and maybe some that are but are known to be potentially dangerous (bin files). Obviously, there would need to be some kind of configuration option here.
1: https://chatgpt.com/share/69c3ec10-0e40-832a-b905-31736d8a34...
I don't follow why you'd run uncommitted non-reviewed code outside of the sandbox (by sandbox I'm meaning something as secure as a VM) you use. My mental model is more that you no longer compile / run code outside of the sandbox, it contains everything, then when a change is ready you ship it after a proper review.
The way I'd do it right now:
* git worktree to have a specific folder with a specific branch to which the agent has access (with the .git in another folder)
* have some proper review before moving the commits there into another branch, committing from outside the sandbox
* run code from this review-protected branch if needed
Ideally, within the sandbox, the agent can go nuts to run tests, do visual inspections e.g. with web dev, maybe run a demo for me to see.
Yeah, never allow githooks ;)
It's a good point. Maybe I should add an option to make certain directories read-only even under the current working directory, so that you can make .git/ read-only without moving it out of the project directory.
You can already make CWD an overlay with "jai -D". The tricky part is how to merge the changes back into your main working directory.