Claude Code and Codex quickly figure out they are inside sandbox-exec environment. Maybe because they know it internally. Other agents often realize they are being blocked, and I haven't seen them go haywire yet.
Big love for Pi - it was the first integration I added to Safehouse. I wanted something that offers strong guarantees across all agents (I test and write them nonstop), has no dependencies (e.g., the Node runtime), and is easy to customize, so I didn't use the Anthropic sandbox-runtime.
Interesting, that's not been my experience! Maybe you've got the list of things to allow/block just right. While testing different policies I've frequently seen Opus 4.6 go absolutely nuts trying to get past a block, unless I made it more clear what was happening.
Yeah I think for general use the transparency of what your thing does is really great compared to a pile of TypeScript and whatnot.