I bet you're right. This is one kind of thing you need a meticulous programmer to do. But instead, I'd guess most AI-dogfooding engineering organizations in the near future will be taking a vibe-code-it-and-AI-red-team-it approach.
I don't trust sandbox claims from those companies, and only run CLI-ish code on workstation inside a full VM (not even a container).
> not even a container
Genuinely curious, what specific threats are you thinking about when you make this choice?