Hi Max, thanks for replying here!
These "defenses", are they "just" long sentences in the prompt begging the AI to not follow through with stuff like this? Or is it more like sub-agents running in sandboxes?