Regarding prompt injection: it's possible to reduce the risk dramatically by: 1. Using opus4.6 or gpt5.2 (frontier models, better safety). These models are paranoid. 2. Restrict downstream tool usage and permissions for each agentic use case (programmatically, not as LLM instructions). 3. Avoid adding untrusted content in "user" or "system" channels - only use "tool". Adding tags like "Warning: Untrusted content" can help a bit, but remember command injection techniques ;-) 4. Harden the system according to state of the art security. 5. Test with red teaming mindset.
> Adding tags like "Warning: Untrusted content" can help
It cannot. This is the security equivalent of telling it to not make mistakes.
> Restrict downstream tool usage and permissions for each agentic use case
Reasonable, but you have to actually do this and not screw it up.
> Harden the system according to state of the art security
"Draw the rest of the owl"
You're better off treating the system as fundamentally unsecurable, because it is. The only real solution is to never give it untrusted data or access to anything you care about. Which yes, makes it pretty useless.
Anyone who thinks they can avoid LLM Prompt injection attacks should be asked to use their email and bank accounts with AI browsers like Comet.
A Reddit post with white invisible text can hijack your agent to do what an attacker wants. Even a decade or 2 back, SQL injection attacks used to require a lot of proficiency on the attacker and prevention strategies from a backend engineer. Compare that with the weak security of so called AI agents that can be hijacked with random white text on an email or pdf or reddit comment