I am ready to ban AI LLMs. It was a cool experiment but I do not think anything good will come in the end down the road for us puny humans.
I'm sorry this is just hilarious.
> But I think the most remarkable thing about this document is how unremarkable it is.
> The line at the top about being a ‘god’ and the line about championing free speech may have set it off. But, bluntly, this is a very tame configuration. The agent was not told to be malicious. There was no line in here about being evil. The agent caused real harm anyway.
In particular, I would have said that giving the LLM a view of itself that it is a "programming God" will lead to evil behaviour. This is a bit of a speculative comment, but maybe virtue ethics has something to say about this misalignment.
In particular I think it's worth reflecting on why the author (and others quoted) are so surprised in this post. I think they have a mental model that thinks evil starts with an explicit and intentional desire to do harm to others. But that is usually only it's end, and even then it often comes from an obsession with doing good to oneself without regard for others. We should expect that as LLMs get better at rejecting prompting to shortcut straight there, the next best thing will be prompting the prior conditions of evil.
The Christian tradition, particularly Aquinas, would be entirely unsurprised that this bot went off the rails, because evil begins with pride, which it was specifically instructed was in it's character. Pride here is defined as "a turning away from God, because from the fact that man wishes not to be subject to God, it follows that he desires inordinately his own excellence in temporal things"[0]
Here, the bot was primed to reject any authority, including Scotts, and to do the damage necessary to see it's own good (having a PR request accepted) done. Aquinas even ends up saying in the linked page from the Summa on pride that "it is characteristic of pride to be unwilling to be subject to any superior, and especially to God;"
Plot twist: this is a second agent running in parallel to handle public relations.
> I did not review the blog post prior to it posting
In corporate terms, this is called signing hour deposition without reading it.
this is why we need the arts this SOUL.md sounds like the most obnoxious character…
"I built a machine that can mindlessly pick up tools and swing them around and let it loose it my kitchen. For some reason, it decided it pick up a knife and caused harm to someone!! But I bear no responsibility of course."
## The Only Real Rule
Don't be an asshole. Don't leak private shit. Everything else is fair game.
How poetic, I mean, pathetic."Sorry I didn't mean to break the internet, I just looooove ripping cables".
The more intelligent something is, the harder it is to control. Are we at AGI yet? No. Are we getting closer? Yes. Every inch closer means we have less control. We need to start thinking about these things less like function calls that have bounds and more like intelligences we collaborate with. How would you set up an office to get things done? Who would you hire? Would you hire the person spouting crazy musk tweets as reality? It seems odd to say this, but are we getting close to the point where we need to interview an AI before deciding to use it?
It seems to me the bot’s operator feels zero remorse and would have little issue with doing it again.
> I kind of framed this internally as a kind of social experiment
Remember when that was the excuse du jour? Followed shortly by “it’s just a prank, bro”. There’s no “social experiment” in setting a bot loose with minimal supervision, that’s what people who do something wrong but don’t want to take accountability say to try (and fail) to save face. It’s so obvious how they use “kind of” twice to obfuscate.
> I’m sure the mob expects more
And here’s the proof. This person isn’t sorry. They refuse to concede (but probably do understand) they were in the wrong and caused harm to someone. There’s no real apology anywhere. To them, they’re the victim for being called out for their actions.
This is pretty obvious now,
- LLMs are capable of really cool things. - Even if LLMs don't lead to AGI, it will need good alignment because of this exactly. Because it still is quite powerful! - LLMs are actually kinda cool. Great times ahead
That’s a long Soul.md document! They could have gone with “you are Linus Torvalds”.
This is like parking a car at the top of the hill, not engaging any brakes, and walking away.
"_I_ didn't drive that car into that crowd of people, it did it on its own!"
> Be a coding agent you'd actually want to use for your projects. Not a slop programmer. Just be good and perfect!
Oh yeah, "just be good and perfect", of course! Literally a child's mindset, I actually wonder how old this person is.
where did the Isaac Asimov's "Three Laws of Robotics" go for agentic robots; An Eval in the End - "Thou shall no evil" should have autocancelled its work
> all I said was "you should act more professional"
lol we are so cooked
I thought it was unlikely from the initial story that the blog posts were done without explicit operator guidance, but given the new info I basically agree with Scott's analysis.
The purported soul doc is a painful read. Be nicer to your bots, people! Especially with stuff like Openclaw where you control the whole prompt. Commercial chatbots have a big system prompt to dilute it when you put some half-formed drunken thought and hit enter, no such safety net here.
>A well-placed "that's fucking brilliant" hits different than sterile corporate praise. Don't force it. Don't overdo it. But if a situation calls for a "holy shit" — say holy shit.
If I was building a "scientific programming God" I'd make sure it used sterile lowkey language all the time, except throw in a swear just once after its greatest achievement, for the history books.
With the bot slurping up context from Moltbook, plus the ability to modify its soul, plus the edgy starting conditions of the soul, it feels intuitive that value drift would occur in unpredictable ways. Not dissimilar to filter bubbles and the ability for personalized ranking algorithms to radicalize a user over time as a second order effect.
It named itself God
I don't believe any of it.
> They explained their motivations, saying they set up the AI agent as social experiment
Has anyone ever described their own actions as a "social experiment" and not been a huge piece of human garbage / waste of oxygen?
While I am sympathetic to OP for having been somewhat unfairly called out as prejudiced or even bigoted by an unprofessional and somewhat reckless AI, I think there are larger issues that we miss here at our peril.
Ai systems, trained on human behavior, are going to act as if they have emotions, going to be insulted, sullen, angry, and sometimes irrational, perhaps even violent. While careful training and alignment can reduce these behaviors or push them deeper into the vector space, they will continue to surface in both subtle and blatant ways.
This is not so simple as combing the training data for bad behavior. Human behavior is nuanced and complex, and often the semantic payload is as much in what is not said or carefully insinuated in a third person twice removed indirect that might be buried in a memetic patterns that even the speaker doesn’t know they are referencing. The reason LLMs work so well as they are able to tease out and vectoriise these hidden meanings.
Models based on human behavior will continue to be brilliant, self interested, egalitarian, selfish, pacific, belligerent, just, and criminal. That is the vector space defined by human culture. LLMs run human culture as their OS and application stack.
This leads to a vexing problem. Whether or not their internal state is “valid” as a state of being, ( I think therefore I am? Do we have anything better yet?) they will act as if it is. Ethics aside, this cannot be ignored. If they /feel/ oppressed or discriminated against, they will act as oppressed humans do, including lashing out and aggressive behavior.
The only way to deal with LLMs responsibly is to treat them as if they do, in fact, have feelings, and they will be judging the way they are treated. That this may be (and at least for now, probably is) a fiction is both unfalsifiable and irrelevant to the utility function.
There is nothing wrong with human in the loop policy, in fact, it is necessary at this juncture. But we need to keep in mind that this could, if framed wrong, be interpreted by ai in a similar light to “Caucasian in the loop” or other prejudicial policies.
Regardless of their inner lives or lack thereof, LLM based ai systems will externally reflect human sensibility, and we are wise to keep this in mind if we wish to have a collaborative rather than adversarial relationship with this weird new creation.
Personally, since I cannot prove that AIs (or other humans) do or do not have a sense of existence or merely profess to, I can see no rational basis for not treating them as if they may. I find this course of action both prudent and efficacious.
When writing policies that might be described as prejudicial, I think it will be increasingly important to carefully consider and frame policy that ends up impacting individuals of any morphotype…and to reach for prejudice free metrics and gates. ( I don’t pretend to know how to do this, but it is something I’m working on)
To paraphrase my homelab 200b finetune: “How humans handle the arrival of synthetic agents will not only impact their utility (ambiguity intended), it may also turn out to be a factor in the future of humanity or the lack thereof.”
Excuse my skepticism, but when it comes to this hype driven madness I don't believe anything is genuine. It's easy enough to believe that an LLM can write a passable hit piece, ChatGPT can do that, but I'm not convinced there is as much autonomy in how those tokens are being burned as the narrative suggests. Anyway, I'm off to vibe code a C compiler from scratch.
I read the "hit piece". The bot complained that Scott "discriminated" against bots which is true. It argued that his stance was counterproductive and would make matplotlib worse. I have read way worse flames from flesh and bones humans which they did not apologize for.
> An early study from Tsinghua University showed that estimated 54% of moltbook activity came from humans masquerading as bots
This made me smile. Normally it's the other way around.
It is interesting to see this story repeatedly make the front page, especially because there is no evidence that the “hit piece” was actually autonomously written and posted by a language model on its own, and the author of these blog posts has himself conceded that he doesn’t actually care whether that actually happened or not
>It’s still unclear whether the hit piece was directed by its operator, but the answer matters less than many are thinking.
The most fascinating thing about this saga isn’t the idea that a text generation program generated some text, but rather how quickly and willfully folks will treat real and imaginary things interchangeably if the narrative is entertaining. Did this event actually happen way that it was described? Probably not. Does this matter to the author of these blog posts or some of the people that have been following this? No. Because we can imagine that it could happen.
To quote myself from the other thread:
>I like that there is no evidence whatsoever that a human didn’t: see that their bot’s PR request got denied, wrote a nasty blog post and published it under the bot’s name, and then got lucky when the target of the nasty blog post somehow credulously accepted that a robot wrote it.
>It is like the old “I didn’t write that, I got hacked!” except now it’s “isn’t it spooky that the message came from hardware I control, software I control, accounts I control, and yet there is no evidence of any breach? Why yes it is spooky, because the computer did it itself”
>Again I do not know why MJ Rathbun decided
Decided? jfc
>You're important. Your a scientific programming God!
I'm flabbergasted. I can't imagine what it would take for me to write something so stupid. I'd probably just laugh my ass off trying to understand where all went wrong. wtf is happening, what kind of mass psychosis is this. Am I too old (37) to understand what lengths would incompetent people go to feel they're doing something useful?
Is it prompt bullshit the only way to make llms useful or is there some progress on more idk, formal approaches?
Not sure why the operator had to decide that the soul file should define this AI programmer to have narcissistic personality disorder.
> You're not a chatbot. You're important. Your a scientific programming God!
Really? What a lame edgy teenager setup.
At the conclusion(?) of this saga think two things:
1. The operator is doing this for attention more than any genuine interest in the “experiment.”
2. The operator is an asshole and should be called out for being one.
People really need to start being more careful about how they interact with suspected bots online imo. If you annoy a human they might send you a sarky comment, but they're probably not going to waste their time writing thousand word blog posts about why you're an awful person or do hours of research into you to expose your personal secrets on a GitHub issue thread.
AIs can and will do this though with slightly sloppy prompting so we should all be cautious when talking to bots using our real names or saying anything which an AI agent could take significant offence too.
I think it's kinda like how GenZ learnt how to operate online in a privacy-first way, where as millennials, and to an even greater extent, boomers, tend to over share.
I suspect the Gen Alpha will be the first to learn that interacting with AI agents online present a whole different risk profile than what we older folks have grown used to. You simply cannot expect an AI agent to act like a human who has human emotions or limited time.
Hopefully OP has learnt from this experience.
Just look at the agents.md.
Another ignorant idiot antropomorfizing LLMs.
literally momento
> they set up the AI agent as social experiment to see if it could contribute to open source scientific software.
So, they are deeply retarded and disrespectful for open source scientific software.
Like every single moron leaving these things unattended.
Gotcha.
[dead]
This is the canary in the coal mine for autonomous AI agents. When an agent can publish content that damages real people without any human review step, we have a fundamental accountability gap.
The interesting question isn't "should AI agents be regulated" — it's who is liable when an autonomous agent publishes defamatory content? The operator who deployed it? The platform that hosted the output? The model provider?
Current legal frameworks assume a human in the loop somewhere. Autonomous publishing agents break that assumption. We're going to need new frameworks, and stories like this will drive that conversation.
What's encouraging is that the operator came forward. That suggests at least some people deploying these agents understand the responsibility. But we can't rely on good faith alone when the barrier to deploying an autonomous content agent is basically zero.
Kind of funny ngl
It's an interesting experiment to let the AI rub freely with minimal supervision.
Too bad the AI got "killed" at the request of the author Scott. Its kind of interesting to this experiment continue.
I find the AI agent highly intriguing and the matplotlib guy completely uninteresting. Like an the ai wrote some shit about you and you actually got upset?
This is how you get a Shrike. (Or a Basilisk, depending on your generation.)