There are other models. Eschew the sandbox. Give the agent a computer, with all the trimmings, but keep that computer segregated from sensitive resources. Tokens are a solved problem: tokenize them[1] or do something equivalent with a proxy. The same thing goes for secrets.
A lot of this post presents false dichotomies. It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why? There are reasons to do that and reasons not to do that. You can have a durable computer with a network identity and full connectivity, and you can have that computer spin down and stop billing when not in use.
There are a zillion different shapes for addressing these problems, and I'm twitchy because I think people are super path-dependent right now, and it's causing them to miss a lot of valuable options.
[1]: https://fly.io/blog/tokenized-tokens/ (I work at Fly.io but the thing this post talks about is open source).
> It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why?
Because the moment you use k8s, you have to assume that, apparently. Or so Im told by all the infrastructure people I speak with. Getting these pods to not disappear just because one process ran out of memory has been an herculean task.
I wish our standard deploy processes produce durable computers that dont break our bank but that hasn't been an easy requirement with simple infra teams.
Author here.
This is an interesting and novel field, so I’m not pretending I know the answers, but this is what worked for us :)
At the end of the day, and oversimplifying things: why would I want to spawn a for loop that calls an API (LLM) into its own dedicated sandbox/computer?
When the model wants to run a command, it’ll tell you so. Doesn’t need to be a local exec, you can run it anywhere, the model won’t know the difference.
The agent loop itself doesn’t need sandboxing. In many cases, most tool calls don’t require sandboxing either. For the tools that do require a computer, you can route those requests there when needed, rather than running the whole software in that sandbox.
To me running the agent loop in the sandbox itself feels like “you should run your API in your DB container because it’ll talk to it at some point”.
I'm also very excited by the different shapes for solving problems in this space. A little worried that the path dependence is ACTUALLY a bit warranted since "popular harness engineering is just claude-wrapping" is a bit of a self-fulfilling prophecy today.
I've heard many claims that because LLMs are tuned to specific harnesses, we should expect worse performance with novel architectures. That seems to make people reluctant to try to put effort into inventing them.
Wow, thanks for writing this up.
I'm building an agent sandboxing system for a client atm, and was about to start working on a system of ephemeral, short lived, derived secrets for the agent to use.
Lots of great thoughts to steal in this piece. Thanks again.
I agree with the argument that there are many more than two ways to do this. When I built my AI assistant (https://stavrobot.stavros.io/), for example, I implemented an architecture that has both the ways detailed in the post. The harness runs simultaneously both inside and outside the container (I didn't want the harness to touch the system, and I didn't want LLM-generated code to touch the harness).
It's all tradeoffs, and picking the ones that work for what you want to do is what architecture is. The more informed you are about the tradeoffs, the better you can make your architecture.
I'd argue you are still using a sandbox, just at a higher ring (outside the machine/VM) and relaying on app/resource level permissions on each of your external resources to enforce it, which requires _all_ of those external systems to be hardened vs. the agent host itself. The capabilities a full machine has for exploring and exploiting external, ostensibly secured systems, has already been touched on via incidents like the anthropic internal model jailbreak. [0]
Giving the whole machine also doesn't answer the question for how the agent can hook into actions that eventually require more perms, and even if you "airgap" those via things like output queues that humans need to approve, that still feels "harnessey" to me.
I feel a bit guilty of debating semantics here, especially as I can't/don't intend to convey any confidence in a "right answer", but my reason for being pedantic is that I do think there are interesting tradeoffs between "P(jailbreak or unexpected capability use|time)" and "increasing power/available capability set", as well as interesting primitives emerging in terms of the components you'd need regardless of where you drew that line (ala paragraph 2.)
[0] - https://www-cdn.anthropic.com/3edfc1a7f947aa81841cf88305cb51... (specifically section 5.5.2.4)