If those prompt limits were reliable borders it would be very cool.
But that would mean the alignment problem was solved. When survival depends on watering down ethics, and there is a long slippery slope of ethical wells available to plum, depending on prompts for safety sounds risky.
Is that malicious? I don't think that would be considered malicious. We don't consider it immoral for starving people to steal food.
But, I am fascinated by the idea too! I just think it is a terrible idea (despite being almost certain to happen).