You must explicitly state what your threat model is when writing about security tooling, isolation, and sandboxing.
This threat model is concerned with running arbitrary code generated by or fetched by an AI agent on host machines which contain secrets, sensitive files, and/or exfoliate data, apps, and systems which should not be lost.
What about the threat model where an agent deletes your entire inbox? Or sends your calendar events to a server after prompt injection? Bank transfers of the wrong amount to the wrong address etc. all these are allowed under the sandboxing model.
We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".
Sandboxes do not solve permission escalation or exfiltration threats.
You mean like the section which goes into the threat model?
The Security Model: Design for Distrust
I wrote about this in Don’t Trust AI Agents: when you’re building with AI agents, they should be treated as untrusted and potentially malicious. Prompt injection, model misbehavior, things nobody’s thought of yet. The right approach is architecture that assumes agents will misbehave and contains the damage when they do…I built an agent framework designed from the ground up around policy control (https://github.com/sibyllinesoft/smith-core) and I'm in the process of extracting the gateway from it so people can provide that same policy gated security to whatever agent they want (https://github.com/sibyllinesoft/smith-gateway).
My posts about these aspects of agent security get zero engagement (not even a salty "vibe slop" comment, lol), so ironically security is the thing everyone's talking about, but most people don't know enough to understand what they need.
That's a great question, and it reminds me of something I read today:
https://entropytown.com/articles/2026-03-12-openclaw-sandbox...
The core issue, to me, is that permissions are inherently binary — can it send an email or not — while LLMs are inherently probabilistic. Those two things are fundamentally in tension.
> We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".
We already have: IAM, WIF, Macaroons, Service Accounts
Ask you resident SecOps and DevOps teams what your company already has available
[dead]
> We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".
Yes 100%, this is the critical layer that no one is talking about.
And I'd go even further: we need the ability to dynamically attenuate tool scope (ocap) and trace data as it flows between tools (IFC). Be able to express something like: can't send email data to people not on the original thread.