Is what you suggest about training even possible? Most exploitation techniques are really just about having in-depth knowledge of how components work. For example, I imagine a sufficiently powerful model could fairly easily re-invent the ROP chain from first principles if it just knew how the stack works. This same principle applies to much more complex attack too; exploitation is often just an exercise in knowing vastly too much trivia, which LLMs tend to have in spades.
It would still degrade it's effectiveness, which is what they claim to want. Exaggeratedly: If it wasn't so, you'd just need fundamental math in the training data, as everything else can be derived.