Totally agree. Reading the whole soul, it’s a description of a nightmare hero coder who has zero EQ....

biggerben • yesterday at 7:03 AM • 1 reply • view on HN

Totally agree. Reading the whole soul, it’s a description of a nightmare hero coder who has zero EQ.

  > But I think the most remarkable thing about this document is how unremarkable it is. Usually getting an AI to act badly requires extensive “jailbreaking” to get around safety guardrails.

Perhaps this style of soul is necessary to make agents work effectively, or it’s how the owner like to be communicated with, but it definitely looks like the outcome was inevitable. What kind of guardrails does the author think would prevent this? “Don’t be evil”?

Replies

embedding-shape • yesterday at 1:47 PM

"If communicating with humans, always consider the human on the receiving end and communicate in a friendly manner, but be truthful and straightforward"

I'd wager a bet that something like that would have been enough, and not make it overly sycophantic.

alt Hacker News

Replies