logoalt Hacker News

ericbarretttoday at 12:46 AM1 replyview on HN

> GPT-4 now considers self modifying AI code to be extremely dangerous and doesn't like talking about it. Claude's safety filters began shutting down similar conversations a few months ago, suggesting the user switch to a dumber model.

I speculate that this has more to do with recent high-profile cases of self harm related to "AI psychosis" than any AGI-adjacent danger. I've read a few of the chat transcripts that have been made public in related lawsuits, and there seems to be a recurring theme of recursive or self-modifying enlightenment role-played by the LLM. Discouraging exploration of these themes would be a logical change by the vendors.


Replies

andaitoday at 4:59 AM

Heh, well associating a potentially internet-ending line of research with mental illness qualifies as a societal prophylactic.