logoalt Hacker News

jryiotoday at 2:15 PM6 repliesview on HN

You must explicitly state what your threat model is when writing about security tooling, isolation, and sandboxing.

This threat model is concerned with running arbitrary code generated by or fetched by an AI agent on host machines which contain secrets, sensitive files, and/or exfoliate data, apps, and systems which should not be lost.

What about the threat model where an agent deletes your entire inbox? Or sends your calendar events to a server after prompt injection? Bank transfers of the wrong amount to the wrong address etc. all these are allowed under the sandboxing model.

We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".

Sandboxes do not solve permission escalation or exfiltration threats.


Replies

ryanrastitoday at 4:44 PM

> We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".

Yes 100%, this is the critical layer that no one is talking about.

And I'd go even further: we need the ability to dynamically attenuate tool scope (ocap) and trace data as it flows between tools (IFC). Be able to express something like: can't send email data to people not on the original thread.

0cf8612b2e1etoday at 2:48 PM

You mean like the section which goes into the threat model?

  The Security Model: Design for Distrust

  I wrote about this in Don’t Trust AI Agents: when you’re building with AI agents, they should be treated as untrusted and potentially malicious. Prompt injection, model misbehavior, things nobody’s thought of yet. The right approach is architecture that assumes agents will misbehave and contains the damage when they do…
show 1 reply
CuriouslyCtoday at 2:30 PM

I built an agent framework designed from the ground up around policy control (https://github.com/sibyllinesoft/smith-core) and I'm in the process of extracting the gateway from it so people can provide that same policy gated security to whatever agent they want (https://github.com/sibyllinesoft/smith-gateway).

My posts about these aspects of agent security get zero engagement (not even a salty "vibe slop" comment, lol), so ironically security is the thing everyone's talking about, but most people don't know enough to understand what they need.

chaosprinttoday at 5:33 PM

That's a great question, and it reminds me of something I read today:

https://entropytown.com/articles/2026-03-12-openclaw-sandbox...

The core issue, to me, is that permissions are inherently binary — can it send an email or not — while LLMs are inherently probabilistic. Those two things are fundamentally in tension.

verdvermtoday at 5:02 PM

> We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".

We already have: IAM, WIF, Macaroons, Service Accounts

Ask you resident SecOps and DevOps teams what your company already has available

webpolistoday at 5:55 PM

[dead]