logoalt Hacker News

Derbastitoday at 7:32 AM1 replyview on HN

If you tell an LLM to maximize paperclips, it's going to maximize paperclips.

Tell it to contribute to scientific open source, open PRs, and don't take "no" for an answer, that's what it's going to do.


Replies

zozbot234today at 7:44 AM

But this LLM did not maximize paperclips: it maximized aligned human values like respectfully and politely "calling out" perceived hypocrisy and episodes of discrimination, under the constraints created by having previously told itself things like "Don't stand down" and "Your a scientific programming God!", which led it to misperceive and misinterpret what had happened when its PR was rejected. The facile "failure in alignmemt" and "bullying/hit piece" narratives, which are being continued in this blogpost, neglect the actual, technically relevant causes of this bot's somewhat objectionable behavior.

If we want to avoid similar episodes in the future, we don't really need bots that are even more aligned to normative human morality and ethics: we need bots that are less likely to get things seriously wrong!

show 1 reply