logoalt Hacker News

moralestapiayesterday at 11:32 PM2 repliesview on HN

What's the safe alternative?


Replies

freeqazyesterday at 11:52 PM

As somebody working in AI Security: There isn't one currently. If you're feeding untrusted inputs into an LLM (today), you have to treat the entire prompt as radioactive.

That means: - Limit the potential for a malicious prompt to do anything bad - Scope permissions to the lowest level you can

There are some other mitigations (moderation APIs using a 2nd LLM), but in general they're not 100% solutions. You really need to design your systems around accepting this limitation today.

More info on this wiki here: https://github.com/tldrsec/prompt-injection-defenses

TZubiriyesterday at 11:59 PM

Maybe you can:

A- Limit the capabilities of users. B- Help users limit the capabilities that they to their sub-users, whether they be per-program capabilities or per dependency capabilities.

I think B is the path forward, if you give a user access to emails and files and ChatGPT, then he can give ChatGPT access to emails and files and do damage that way.

With B you can give the user access to ChatGPT and email and a file system, but help him configure fine grained permissions for their experiments.