logoalt Hacker News

jascha_engtoday at 7:15 PM0 repliesview on HN

Yes and even now if you tell the LLM any private information inside the sandbox it can now leak that if it gets misdirected/prompt injected.

So there isn't really a way to avoid this trade-off you can either have a useless agent with no info and no access. Or a useful agent that then is incredibly risky to use as it might go rogue any moment.

Sure you can slightly choose where on the scale you want to be but any usefulness inherently means it's also risky if you run LLMs async without supervision.

The only absolutely safe way to give access and info to an agent is with manual approvals for anything it does. Which gives you review fatigue in minutes.