> Not only is this pure science fiction at this point, but injecting non-determinism into your defensive layer is terrifying and incredibly stupid. If you use an LLM to evaluate whether another LLM is doing something malicious, you now have two hallucination risks instead of one. You also risk a prompt-injection attack making it all the way to your security layer.
I've found fictional displays of "system compromise" kinda ridiculous in e.g. Halo. Now I know that Cortana throws AI slop input into AI slop infrastructure with thousands of subagents until she's in.