logoalt Hacker News

wrstoday at 5:15 PM3 repliesview on HN

>Comments should be passed to the model with clear role boundaries that prevent them from being interpreted as system-level directives.

Well, such clear boundaries would solve lots of problems. But those don’t exist, do they?


Replies

mattalextoday at 7:10 PM

You can get rid of 99.9% of those attacks by simply dispatching the data consumption to a different instance of the LLM, see, for instance, some of the later patterns in https://arxiv.org/abs/2506.08837

show 1 reply
InsideOutSantatoday at 5:38 PM

Yeah, I suspect the main reason this was rejected is simply because it's not fixable. This is just how LLMs work. This LLM ingests untrusted data, so there will always be a non-zero chance that this type of prompt injection succeeds.

chiastoday at 7:03 PM

Ah yes - the cure for world hunger: eating food.