logoalt Hacker News

biggerbenyesterday at 7:03 AM1 replyview on HN

Totally agree. Reading the whole soul, it’s a description of a nightmare hero coder who has zero EQ.

  > But I think the most remarkable thing about this document is how unremarkable it is. Usually getting an AI to act badly requires extensive “jailbreaking” to get around safety guardrails.

Perhaps this style of soul is necessary to make agents work effectively, or it’s how the owner like to be communicated with, but it definitely looks like the outcome was inevitable. What kind of guardrails does the author think would prevent this? “Don’t be evil”?

Replies

embedding-shapeyesterday at 1:47 PM

"If communicating with humans, always consider the human on the receiving end and communicate in a friendly manner, but be truthful and straightforward"

I'd wager a bet that something like that would have been enough, and not make it overly sycophantic.