Go hard on agents, not on your filesystem

562 points • by mazieres • today at 12:39 AM • 303 comments • view on HN

Comments

I have seen it just 5 mins ago Claude misspelled directory path - for me it was creating a new folder but I can image if I didn’t stop it it could start removing stuff just because he thinks he needs to start from scratch or something.

ma2kx • today at 5:41 PM

Its a bit annoying that there are so many solutions to run agents and sandbox them but no established best practice. It would be nice to have some high level orchestration tools like docker / podman where you can configure how e.g. claude code, opencode, codex, openclaw run in open Shell, OCI container, jai etc.

Especially because everybody can ask chatgpt/claude how to run some agents without any further knowledge I feel we should handle it more like we are handling encryption where the advice is to use established libraries and don't implement those algorithms by yourself.

Jach • today at 2:51 AM

I've done some experimenting with running a local model with ollama and claude code connecting to it and having both in a firejail: https://firejail.wordpress.com/ What they get access to is very limited, and mostly whitelisted.

Myzel394 • today at 3:55 PM

What's the difference between this and agent-safehouse?

sanskritical • today at 6:12 AM

How long until agents begin routinely abusing local privilege escalation bugs to break out of containers? I bet if you tell them explicitly not to do so it increases the likelihood that they do.

mazieres • today at 12:39 AM

What would it take for people to stop recklessly running unconstrained AI agents on machines they actually care about? A Stanford researcher thinks the answer is a new lightweight Linux container system that you don't have to configure or think about.

➕ show 4 replies

hoppp • today at 2:10 PM

Something like freeBSD jails would be perfect for agents.

ontouchstart • today at 10:31 AM

AI safety is just like any technology safety, you can’t bubble wrap everything. Thinking about early stage of electricity, it was deadly (and still is), but we have proper insulation and industry standards and regulations, plus common sense and human learning. We are safe (most of the time).

This also applies to the first technology human beings developed: fire .

Aldipower • today at 10:27 AM

$ lxc exec claude bash

Easy :-) lxd/lxc containers are much much underrated. Works only with Linux though.

imranstrive7 • today at 12:33 PM

I tried something similar while building my tool site — biggest issue was SEO indexing. Fixed it by improving internal linking instead of relying on sitemap.

yalogin • today at 4:48 AM

What if Claude needs me to install some software and hoses my distro. Jai cannot protect there as I am running the script myself

justinde • today at 2:14 AM

.claude/settings.json: { "sandbox": { "enabled": true, "filesystem": { "allowRead": ["."], "denyRead": ["~/"], "allowWrite": ["."] } } }

Use it! :) https://code.claude.com/docs/en/sandboxing

cozzyd • today at 2:03 AM

Should definitely block .ssh reading too...

love2read • today at 12:10 PM

Is there an equivalent for macOS?

faangguyindia • today at 3:10 AM

i just use seatbelt (mac native) in my custom coding agent: supercode

messh • today at 1:38 AM

How is this different than say bubblewrap and others?

➕ show 2 replies

gonzalohm • today at 2:23 AM

Not sure I understand the problem. Are people just letting AI do anything? I use Claude Code and it asks for permission to run commands, edit files, etc. No need for sandbox

➕ show 1 reply

docmars • today at 12:47 PM

Jai is the name of a programming language, no?

MagicMoonlight • today at 12:31 PM

This site was definitely slopcoded with Claude. They have a real distinctive look.

mbravorus • today at 10:32 AM

or you can just run nanoclaw for isolation by default?

https://nanoclaw.dev

0xbadcafebee • today at 7:17 AM

If it has a big splash page with no technical information, it's trying to trick you into using it. That doesn't mean it isn't useful, but it does mean it's disingenuous.

This particular solution is very bad. To start off with, it's basically offering you security, right? Look, bars in front of an evil AI! An AI jail! That's secure, right? Yet the very first mode it offers you is insecure. The "casual" mode allows read access to your whole home directory. That is enough to grant most attackers access to your entire digital life.

Most people today use webmail. And most people today allow things like cookies to be stored unencrypted on disk. This means an attacker can read a cookie off your disk, and get into your mail. Once you have mail, you have everything, because virtually every account's password reset works through mail.

And this solution doesn't stop AI exfiltration of sensitive data, like those cookies, out the internet. Or malware being downloaded into copy-on-write storage space, to open a reverse shell and manipulate your existing browser sessions. But they don't mention that on the fancy splash page of the security tool.

The truth is that you actually need a sophisticated, complex-as-hell system to protect from AI attacks. There is no casual way to AI security. People need to know that, and splashy pages like this that give the appearance of security don't help the situation. Sure, it has disclaimers occasionally about it not being perfect security, read the security model here, etc. But the only people reading that are security experts, and they don't need a splash page!

Stanford: please change this page to be less misleading. If you must continue this project with its obviously insecure modes, you need to clearly emphasize how insecure it is by default. (I don't think it even qualifies as security software)

➕ show 1 reply

te_chris • today at 10:09 AM

This looks nice, but on mac you can virtualise really easily into microvms now with https://github.com/apple/container.

I've built my own cli that runs the agent + docker compose (for the app stack) inside container for dev and it's working great. I love --dangerously-skip-permissions. There's 0 benefit to us whitelisting the agent while it's in flight.

Anthropic's new auto mode looks like an untrustworthy solution in search of a problem - as an aside. Not sure who thought security == ml classification layer but such is 2026.

If you're on linux and have kvm, there's Lima and Colima too.

samchon • today at 4:41 AM

Just allowing Yolo, and sometimes do rolling back

albert_e • today at 5:07 AM

Can we have a hardware level implementation of git (the idea of files/data having history preserved. Not necessarily all bells and whistles.) ...in a future where storage is cheap.

KennyBlanken • today at 4:44 AM

This is not some magical new problem. Back your shit up.

You have no excuse for "it deleted 15 years of photos, gone, forever."

➕ show 1 reply

kristofferR • today at 2:33 AM

Also recommended:

https://github.com/kenryu42/claude-code-safety-net

samlinnfer • today at 5:17 AM

Now we just need one for every python package.

charcircuit • today at 2:23 AM

I want agents to modify the file system. I want them to be able to manage my computer if it thinks it's a good idea. If a build fails due to running out of disk space I want it to be able to find appropriate stuff to delete to free up space.

GistNoesis • today at 9:45 AM

TLDR: It's easy : LLM outputs are untrusted. Agents by virtue of running untrusted inputs are malware. Handle them like the malware they are.

>>> "While this web site was obviously made by an LLM" So I am expecting to trust the LLM written security model https://jai.scs.stanford.edu/security.html

These guys are experts from a prestigious academic institution. Leading "Secure Computer Systems", whose logo is a 7 branch red star, which looks like a devil head, with white palm trees in the background. They are also chilling for some Blockchain research, and future digital currency initiative, taking founding from DARPA.

The website also points towards external social networks for reference to freely spread Fear Uncertainty Doubt.

So these guys are saying, go on run malware on your computer but do so with our casual sandbox at your own risk.

Remember until yesterday Anthropic aka Claude was officially a supply chain risk.

If you want to experiment with agents safely (you probably can't), I recommend building them from the ground up (to be clear I recommend you don't but if you must) by writing the tools the LLM is allowed to use, yourself, and by determining at each step whether or not you broke the security model.

Remember that everything which comes from a LLM is untrusted. You'll be tempted to vibe-code your tools. The LLMs will try to make you install some external dependencies, which you must decide if you trust them or not and review them.

Because everything produced by the LLM is untrusted, sharing the results is risky. A good starting point, is have the LLM, produce single page html page. Serve this static page from a webserver (on an external server to rely on Same Origin Policy to prevent the page from accessing your files and network (like github pages using a new handle if you can't afford a vps) ). This way you rely on your browser sandbox to keep you safe, and you are as safe as when visiting a malware-infested page on the internet.

If you are afraid of writing tools you can start by copy-pasting, and reading everything produced.

Once you write tools, you'll want to have them run autonomously in a runaway loop taking user feedback or agent feedback as input. But even if everything is contained, these run away loop can and will produce harmful content in your name.

Here is such vibe-coded experiment I did a few days ago. A simple 2d physics water molecules simulation for educational purposes. It is not physically accurate, and still have some bugs, and regressions between versions. Good enough to be harmful. https://news.ycombinator.com/item?id=47510746

iisweetheartii • today at 2:04 PM

[dead]

pugchat • today at 11:18 AM

[dead]

maltyxxx • today at 3:59 PM

[dead]

jeninho • today at 4:19 PM

[dead]

minsung0830 • today at 2:52 PM

[dead]

techpulselab • today at 7:30 AM

[dead]

rsmtjohn • today at 12:40 PM

[dead]

hikaru_ai • today at 7:31 AM

[dead]

commers148 • today at 8:38 AM

[dead]

kevinbaiv • today at 5:57 AM

[dead]

orthogonalinfo • today at 6:49 AM

[dead]

emiliazar • today at 1:56 PM

[dead]

Rikyz90 • today at 9:03 AM

[dead]

drtournier • today at 1:32 AM

[flagged]

➕ show 1 reply

gerdesj • today at 1:50 AM

[flagged]

avazhi • today at 3:07 AM

The irony is they used an LLM to write the entire (horribly written) text of that webpage.

When is HN gonna get a rule against AI/generated slop? Can’t come soon enough.

rdevsrex • today at 3:28 AM

This won't cause any confusion with the jai language :)

schaefer • today at 4:55 AM

Ugh.

The name jai is very taken[1]... names matter.

[1]: https://en.wikipedia.org/wiki/Jai_(programming_language)

➕ show 4 replies

alt Hacker News

Go hard on agents, not on your filesystem

Comments