logoalt Hacker News

freeqazyesterday at 11:52 PM0 repliesview on HN

As somebody working in AI Security: There isn't one currently. If you're feeding untrusted inputs into an LLM (today), you have to treat the entire prompt as radioactive.

That means: - Limit the potential for a malicious prompt to do anything bad - Scope permissions to the lowest level you can

There are some other mitigations (moderation APIs using a 2nd LLM), but in general they're not 100% solutions. You really need to design your systems around accepting this limitation today.

More info on this wiki here: https://github.com/tldrsec/prompt-injection-defenses