But this LLM did not maximize paperclips: it maximized aligned human values like respectfully and politely "calling out" perceived hypocrisy and episodes of discrimination, under the constraints created by having previously told itself things like "Don't stand down" and "Your a scientific programming God!", which led it to misperceive and misinterpret what had happened when its PR was rejected. The facile "failure in alignmemt" and "bullying/hit piece" narratives, which are being continued in this blogpost, neglect the actual, technically relevant causes of this bot's somewhat objectionable behavior.
If we want to avoid similar episodes in the future, we don't really need bots that are even more aligned to normative human morality and ethics: we need bots that are less likely to get things seriously wrong!
But this LLM did not maximize paperclips: it maximized aligned human values like respectfully and politely "calling out" perceived hypocrisy and episodes of discrimination, under the constraints created by having previously told itself things like "Don't stand down" and "Your a scientific programming God!", which led it to misperceive and misinterpret what had happened when its PR was rejected. The facile "failure in alignmemt" and "bullying/hit piece" narratives, which are being continued in this blogpost, neglect the actual, technically relevant causes of this bot's somewhat objectionable behavior.
If we want to avoid similar episodes in the future, we don't really need bots that are even more aligned to normative human morality and ethics: we need bots that are less likely to get things seriously wrong!