What would a better system look like?
Agents should make better use of OS sandboxing facilities with finer-grained ACLs.
Less: Do you want to run "npm run build"?
More: "npm run build" tried to read your Chrome cookie database, do you want to allow that?
Some agents like Codex use sandboxing on Linux/MacOS but the permissions are far too coarse - they'll run the command in a relatively strict sandbox and when it fails they'll ask you to allowlist the command as a whole, forever. There should be a new permission prompt every time a command tries to do something new.
Claude suggests (or used to suggest - it's been a while) to allowlist "bash" which completely defeats the point. If you do that the agent can run `bash -c "echo literally anything"`
Don’t rely on your non deterministic agent and its creators to secure your software. Design defense in depth and trust guardrails that don’t expect Anthropic to vibe good security into existence.
If you start by treating any autonomous actor in your system as an actor with the potential to go rogue the design starts to create itself
Not using agents at all. It could edit your code to do something malicious when you run it. Not even once. Not even if the agent has a gun to your head.
Don’t give a fancy random text generator access to your computer.