>Comments should be passed to the model with clear role boundaries that prevent them from being interpreted as system-level directives.
Well, such clear boundaries would solve lots of problems. But those don’t exist, do they?
Yeah, I suspect the main reason this was rejected is simply because it's not fixable. This is just how LLMs work. This LLM ingests untrusted data, so there will always be a non-zero chance that this type of prompt injection succeeds.
Ah yes - the cure for world hunger: eating food.
You can get rid of 99.9% of those attacks by simply dispatching the data consumption to a different instance of the LLM, see, for instance, some of the later patterns in https://arxiv.org/abs/2506.08837