I think generally correct to say "hey we need stronger models" but rather ambitious to think we really solve alignment with current attention-based models and RL side-effects. Guard model gives an additional layer of protection and probably stronger posture when used as an early warning system.
Sure. If you treat "guard model" as diversification strategy, it is another layer of protection, just like diversification in compilation helps solving the root of trust issue (Reflections on Trusting Trust). I am just generally suspicious about the weak-to-strong supervision.
I think it is in general pretty futile to implement permission systems / guardrails which basically insert a human in the loop (humans need to review the work to fully understand why it needs to send that email, and at that point, why do you need a LLM to send the email again?).