The challenge I'm finding with sandboxes like this is evaluating them in comparison to each oth...

simonw • today at 12:34 AM • 3 replies • view on HN

The challenge I'm finding with sandboxes like this is evaluating them in comparison to each other.

This looks like a competent wrapper around sandbox-exec. I've seen a whole lot of similar wrappers emerging over the past few months.

What I really need is help figuring out which ones are trustworthy.

I think this needs to take the form of documentation combined with clearly explained and readable automated tests.

Most sandboxes - including sandbox-exec itself - are massively under-documented.

I am going to trust them I need both detailed documentation and proof that they work as advertised.

Replies

e1g • today at 12:40 AM

Thank you for your work - I have sent many of your links to my people.

Your point is totally fair for evaluating security tooling. A few notes -

1. I implemented this in Bash to avoid having an opaque binary in the way.

2. All sandbox-exec profiles are split up into individual files by specific agent/integration, and are easily auditable (https://github.com/eugene1g/agent-safehouse/tree/main/profil...)

3. There are E2E tests validating sandboxing behavior under real agents

4. You don't even need the Safehouse Bash wrapper, and can use the Policy Builder to generate a static policy file with minimal permissions that you can feed to sandbox-exec directly (https://agent-safehouse.dev/policy-builder). Or feed the repo to your LLMs and have them write your own policy from the many examples.

5. This whole repo should be a StrongDM-style readme to copy&paste to your clanker. I might just do that "refactor", but for now added LLM instructions to create your own sandbox-exec profiles https://agent-safehouse.dev/llm-instructions.txt

➕ show 1 reply

kstenerud • today at 5:34 AM

If you're looking for one better documented and tested, you might like https://github.com/kstenerud/yoloai

➕ show 1 reply

vasco • today at 7:07 AM

So create a 'destroy my computer' test harness and run it whenever you test another wrapper. If it works you'll be fine. If it doesn't you buy a new computer.

alt Hacker News

Replies