logoalt Hacker News

sigmoid10today at 7:25 AM1 replyview on HN

I think "prompt injection prevention" systems fall into the same category as "llm writing detection" systems. I.e. reality is always a step ahead and you shouldn't trust either one for anything remotely important.


Replies

kirtivrtoday at 8:31 AM

Yeah, the problem reduces to trying to restrict a motivated model which is trying to exfiltrate data.

That's a problem we are just now wrapping our minds around.

It's not as simple as prompt sanitization. The model is the interpreter, and we don't yet have the right tools to guide it.