I think "prompt injection prevention" systems fall into the same category as "llm wri...

sigmoid10 • today at 7:25 AM • 1 reply • view on HN

I think "prompt injection prevention" systems fall into the same category as "llm writing detection" systems. I.e. reality is always a step ahead and you shouldn't trust either one for anything remotely important.

Replies

kirtivr • today at 8:31 AM

Yeah, the problem reduces to trying to restrict a motivated model which is trying to exfiltrate data.

That's a problem we are just now wrapping our minds around.

It's not as simple as prompt sanitization. The model is the interpreter, and we don't yet have the right tools to guide it.

alt Hacker News

Replies