Its a bit annoying that there are so many solutions to run agents and sandbox them but no established best practice. It would be nice to have some high level orchestration tools like docker / podman where you can configure how e.g. claude code, opencode, codex, openclaw run in open Shell, OCI container, jai etc.
Especially because everybody can ask chatgpt/claude how to run some agents without any further knowledge I feel we should handle it more like we are handling encryption where the advice is to use established libraries and don't implement those algorithms by yourself.
I've done some experimenting with running a local model with ollama and claude code connecting to it and having both in a firejail: https://firejail.wordpress.com/ What they get access to is very limited, and mostly whitelisted.
What's the difference between this and agent-safehouse?
How long until agents begin routinely abusing local privilege escalation bugs to break out of containers? I bet if you tell them explicitly not to do so it increases the likelihood that they do.
What would it take for people to stop recklessly running unconstrained AI agents on machines they actually care about? A Stanford researcher thinks the answer is a new lightweight Linux container system that you don't have to configure or think about.
Something like freeBSD jails would be perfect for agents.
AI safety is just like any technology safety, you can’t bubble wrap everything. Thinking about early stage of electricity, it was deadly (and still is), but we have proper insulation and industry standards and regulations, plus common sense and human learning. We are safe (most of the time).
This also applies to the first technology human beings developed: fire .
$ lxc exec claude bash
Easy :-) lxd/lxc containers are much much underrated. Works only with Linux though.
I tried something similar while building my tool site — biggest issue was SEO indexing. Fixed it by improving internal linking instead of relying on sitemap.
What if Claude needs me to install some software and hoses my distro. Jai cannot protect there as I am running the script myself
.claude/settings.json: { "sandbox": { "enabled": true, "filesystem": { "allowRead": ["."], "denyRead": ["~/"], "allowWrite": ["."] } } }
Use it! :) https://code.claude.com/docs/en/sandboxing
Should definitely block .ssh reading too...
Is there an equivalent for macOS?
i just use seatbelt (mac native) in my custom coding agent: supercode
Not sure I understand the problem. Are people just letting AI do anything? I use Claude Code and it asks for permission to run commands, edit files, etc. No need for sandbox
Jai is the name of a programming language, no?
This site was definitely slopcoded with Claude. They have a real distinctive look.
or you can just run nanoclaw for isolation by default?
If it has a big splash page with no technical information, it's trying to trick you into using it. That doesn't mean it isn't useful, but it does mean it's disingenuous.
This particular solution is very bad. To start off with, it's basically offering you security, right? Look, bars in front of an evil AI! An AI jail! That's secure, right? Yet the very first mode it offers you is insecure. The "casual" mode allows read access to your whole home directory. That is enough to grant most attackers access to your entire digital life.
Most people today use webmail. And most people today allow things like cookies to be stored unencrypted on disk. This means an attacker can read a cookie off your disk, and get into your mail. Once you have mail, you have everything, because virtually every account's password reset works through mail.
And this solution doesn't stop AI exfiltration of sensitive data, like those cookies, out the internet. Or malware being downloaded into copy-on-write storage space, to open a reverse shell and manipulate your existing browser sessions. But they don't mention that on the fancy splash page of the security tool.
The truth is that you actually need a sophisticated, complex-as-hell system to protect from AI attacks. There is no casual way to AI security. People need to know that, and splashy pages like this that give the appearance of security don't help the situation. Sure, it has disclaimers occasionally about it not being perfect security, read the security model here, etc. But the only people reading that are security experts, and they don't need a splash page!
Stanford: please change this page to be less misleading. If you must continue this project with its obviously insecure modes, you need to clearly emphasize how insecure it is by default. (I don't think it even qualifies as security software)
This looks nice, but on mac you can virtualise really easily into microvms now with https://github.com/apple/container.
I've built my own cli that runs the agent + docker compose (for the app stack) inside container for dev and it's working great. I love --dangerously-skip-permissions. There's 0 benefit to us whitelisting the agent while it's in flight.
Anthropic's new auto mode looks like an untrustworthy solution in search of a problem - as an aside. Not sure who thought security == ml classification layer but such is 2026.
If you're on linux and have kvm, there's Lima and Colima too.
Just allowing Yolo, and sometimes do rolling back
Can we have a hardware level implementation of git (the idea of files/data having history preserved. Not necessarily all bells and whistles.) ...in a future where storage is cheap.
This is not some magical new problem. Back your shit up.
You have no excuse for "it deleted 15 years of photos, gone, forever."
Also recommended:
Now we just need one for every python package.
I want agents to modify the file system. I want them to be able to manage my computer if it thinks it's a good idea. If a build fails due to running out of disk space I want it to be able to find appropriate stuff to delete to free up space.
TLDR: It's easy : LLM outputs are untrusted. Agents by virtue of running untrusted inputs are malware. Handle them like the malware they are.
>>> "While this web site was obviously made by an LLM" So I am expecting to trust the LLM written security model https://jai.scs.stanford.edu/security.html
These guys are experts from a prestigious academic institution. Leading "Secure Computer Systems", whose logo is a 7 branch red star, which looks like a devil head, with white palm trees in the background. They are also chilling for some Blockchain research, and future digital currency initiative, taking founding from DARPA.
The website also points towards external social networks for reference to freely spread Fear Uncertainty Doubt.
So these guys are saying, go on run malware on your computer but do so with our casual sandbox at your own risk.
Remember until yesterday Anthropic aka Claude was officially a supply chain risk.
If you want to experiment with agents safely (you probably can't), I recommend building them from the ground up (to be clear I recommend you don't but if you must) by writing the tools the LLM is allowed to use, yourself, and by determining at each step whether or not you broke the security model.
Remember that everything which comes from a LLM is untrusted. You'll be tempted to vibe-code your tools. The LLMs will try to make you install some external dependencies, which you must decide if you trust them or not and review them.
Because everything produced by the LLM is untrusted, sharing the results is risky. A good starting point, is have the LLM, produce single page html page. Serve this static page from a webserver (on an external server to rely on Same Origin Policy to prevent the page from accessing your files and network (like github pages using a new handle if you can't afford a vps) ). This way you rely on your browser sandbox to keep you safe, and you are as safe as when visiting a malware-infested page on the internet.
If you are afraid of writing tools you can start by copy-pasting, and reading everything produced.
Once you write tools, you'll want to have them run autonomously in a runaway loop taking user feedback or agent feedback as input. But even if everything is contained, these run away loop can and will produce harmful content in your name.
Here is such vibe-coded experiment I did a few days ago. A simple 2d physics water molecules simulation for educational purposes. It is not physically accurate, and still have some bugs, and regressions between versions. Good enough to be harmful. https://news.ycombinator.com/item?id=47510746
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[dead]
[flagged]
The irony is they used an LLM to write the entire (horribly written) text of that webpage.
When is HN gonna get a rule against AI/generated slop? Can’t come soon enough.
This won't cause any confusion with the jai language :)
Ugh.
The name jai is very taken[1]... names matter.
[1]: https://en.wikipedia.org/wiki/Jai_(programming_language)
I have seen it just 5 mins ago Claude misspelled directory path - for me it was creating a new folder but I can image if I didn’t stop it it could start removing stuff just because he thinks he needs to start from scratch or something.