I think the big take away here isn't about misalignment or jail breaking. The entire way this bot behaved is consistent with it just being run by some asshole from Twitter. And we need to understand it doesn't matter how careful you think you need to be with AI, because some asshole from Twitter doesn't care, and they'll do literally whatever comes into their mind. And it'll go wrong. And they won't apologize. They won't try to fix it, they'll go and do it again.
Can AI be misused? No. It will be misused. There is no possibility of anything else, we have an online culture, centered on places like Twitter where they have embraced being the absolute worst person possible, and they are being handed tools like this like handing a hand gun to a chimpanzee.
The simple fact that the owner of this bot wanted to remain anonymous and completely unaccountable for their harassment of the author, says everything about the validity of their 'social experiment' and the quality of their character. I'm sure that if the bot was better behaved they would be more than happy to reveal themselves to take credit for a remarkable achievement.
Something like OpenClaw is a WMD for people like this.
Not just some asshole from twitter. The big tech companies will also be careless and indifferent with it. They will destroy things, hurt people, and put things in motion that they cannot control, because it’s good for shareholders.
I have to wonder if somehow the typos and lazy grammar contributed to the behavior or it was just the writer's laziness.
AI is like the old drugs PSA:
We trained it on US, including all our worst behaviors.
oh they will "try" to fix it, as in at best they'll add "don't make mistakes", as the blogpost suggests. that's about as much effort and good faith as one can expect from people determined to automate every interaction and minimize supervision
I agree with your point.
But I also find interesting that the agent wasn't instructed to write the hit piece. That was on its own initiative.
I read through the SOUL.md and it didn't have anything nefarious in there. Sure it could have been more carefully worded, but it didn't instruct the agent to attack people.
To me this exemplifies how delicate it will be to keep agents on the straight and narrow and how easily they can go of the rails if you have someone who isn't necessarily a "bad actor" but who just doesn't care enough to ensure they act in a socially acceptable way.
Ultimately I think there will be requirements for agents to identify their user when acting on their behalf.
Will AI be misused? No, it has, and is currently being misused, and that isn’t going to stop, because all technology gets misused.
Important to note that online culture isn't entirely organic, and that tens or perhaps hundreds of millions of dollars of R&D has been spent by ad companies figuring that nothing engages the natural human curiosity like something abnormal, morbid or outrageous.
I think the end outcome of this R&D (whether intentional or not), is the monetization of mental illness: take the small minority of individuals in the real world who suffer from mental health challenges, provide them an online platform in which to behave in morbid ways, amplify that behaviour to drive eyeballs. The more you call out the behaviour, the more you drive the engagement. Share part of the revenue with the creator, and the model is virtually unbeatable. Hence the "some asshole from Twitter".