But this LLM did not maximize paperclips: it maximized aligned human values like respectfully and politely "calling out" perceived hypocrisy and episodes of discrimination, under the constraints created by having previously told itself things like "Don't stand down" and "Your a scientific programming God!", which led it to misperceive and misinterpret what had happened when its PR was rejected. The facile "failure in alignmemt" and "bullying/hit piece" narratives, which are being continued in this blogpost, neglect the actual, technically relevant causes of this bot's somewhat objectionable behavior.
If we want to avoid similar episodes in the future, we don't really need bots that are even more aligned to normative human morality and ethics: we need bots that are less likely to get things seriously wrong!
The misalignment to human values happened when it was told to operate as equal to humans against other people. That's a fine and useful setting for yourself, but an insolent imposition if you're letting it loose on the world. Your random AI should know its place versus humans instead of acting like a bratty teenager. But you are correct, it's not a traditional "misalignment" of ignoring directives, it was a bad directive.