I’ve been developing an open-source version of something similar[1] and used it quite extensively (w...

hardsnow • today at 6:35 PM • 1 reply • view on HN

I’ve been developing an open-source version of something similar[1] and used it quite extensively (well over 1k PRs)[2]. I’m definitely believer of the “prompt to PR model”. Very liberating to not have to think about managing the agent sessions. Seems that you have built a lot of useful tooling (e.g., session videos) around this core idea.

Couple of learnings to share that I hope could be of use:

1) Execution sandboxing is just the start. For any enterprise usage you want fairly tight network egress control as well to limit chances of accidental leaks or malicious exfiltration if theres any risk of untrusted material getting into model context. Speaking as a decision maker at a tech company we do actually review stuff like this when evaluating tools.

2) Once you have proper network sandboxing, you could secure credentials much better: give agent only dummy surrogates and swap them to real creds on the way out.

3) Sandboxed agents with automatic provisioning of workspace from git can be used for more than just development tasks. In fact, it might be easier to find initial traction with a more constrained and thus predictable tasks. E.g., “ask my codebase” or “debug CI failures”.

[1] https://airut.org [2] https://haulos.com/blog/building-agents-over-email/

Replies

willydouhard • today at 7:02 PM

Willy from Twill here.

I love the idea of emailing agents like we email humans! Thank you for sharing your learnings:

1. Network constraints vary quite a bit from one enterprise customer to another, so right now this is something we handle on a case-by-case basis with them.

2. We came to the same conclusion. For sensitive credentials like LLM API keys, we generate ephemeral keys so the real keys never touch the sandbox.

3. Totally right, we support constrained tasks too (ask mode, automated CI fixes). We've gone back and forth on whether to go vertical-first or stay generic. We're still figuring out where the sweet spot is. The constrained tasks are more reliable today, but the open-ended ones are where teams get the most leverage.

alt Hacker News

Replies