What's the safe alternative?

moralestapia • 05/03/2025 • 2 replies • view on HN

Replies

freeqaz • 05/03/2025

As somebody working in AI Security: There isn't one currently. If you're feeding untrusted inputs into an LLM (today), you have to treat the entire prompt as radioactive.

That means: - Limit the potential for a malicious prompt to do anything bad - Scope permissions to the lowest level you can

There are some other mitigations (moderation APIs using a 2nd LLM), but in general they're not 100% solutions. You really need to design your systems around accepting this limitation today.

More info on this wiki here: https://github.com/tldrsec/prompt-injection-defenses

TZubiri • 05/03/2025

Maybe you can:

A- Limit the capabilities of users. B- Help users limit the capabilities that they to their sub-users, whether they be per-program capabilities or per dependency capabilities.

I think B is the path forward, if you give a user access to emails and files and ChatGPT, then he can give ChatGPT access to emails and files and do damage that way.

With B you can give the user access to ChatGPT and email and a file system, but help him configure fine grained permissions for their experiments.

alt Hacker News

Replies