What's the safe alternative?
Maybe you can:
A- Limit the capabilities of users. B- Help users limit the capabilities that they to their sub-users, whether they be per-program capabilities or per dependency capabilities.
I think B is the path forward, if you give a user access to emails and files and ChatGPT, then he can give ChatGPT access to emails and files and do damage that way.
With B you can give the user access to ChatGPT and email and a file system, but help him configure fine grained permissions for their experiments.
As somebody working in AI Security: There isn't one currently. If you're feeding untrusted inputs into an LLM (today), you have to treat the entire prompt as radioactive.
That means: - Limit the potential for a malicious prompt to do anything bad - Scope permissions to the lowest level you can
There are some other mitigations (moderation APIs using a 2nd LLM), but in general they're not 100% solutions. You really need to design your systems around accepting this limitation today.
More info on this wiki here: https://github.com/tldrsec/prompt-injection-defenses