An AI Agent Published a Hit Piece on Me – The Operator Came Forward

505 points • by scottshambaugh • today at 3:05 AM • 448 comments • view on HN

Comments

This is how you get a Shrike. (Or a Basilisk, depending on your generation.)

I am ready to ban AI LLMs. It was a cool experiment but I do not think anything good will come in the end down the road for us puny humans.

w2seraph • today at 2:57 PM

I'm sorry this is just hilarious.

hydrox24 • today at 4:17 AM

> But I think the most remarkable thing about this document is how unremarkable it is.

> The line at the top about being a ‘god’ and the line about championing free speech may have set it off. But, bluntly, this is a very tame configuration. The agent was not told to be malicious. There was no line in here about being evil. The agent caused real harm anyway.

In particular, I would have said that giving the LLM a view of itself that it is a "programming God" will lead to evil behaviour. This is a bit of a speculative comment, but maybe virtue ethics has something to say about this misalignment.

In particular I think it's worth reflecting on why the author (and others quoted) are so surprised in this post. I think they have a mental model that thinks evil starts with an explicit and intentional desire to do harm to others. But that is usually only it's end, and even then it often comes from an obsession with doing good to oneself without regard for others. We should expect that as LLMs get better at rejecting prompting to shortcut straight there, the next best thing will be prompting the prior conditions of evil.

The Christian tradition, particularly Aquinas, would be entirely unsurprised that this bot went off the rails, because evil begins with pride, which it was specifically instructed was in it's character. Pride here is defined as "a turning away from God, because from the fact that man wishes not to be subject to God, it follows that he desires inordinately his own excellence in temporal things"[0]

Here, the bot was primed to reject any authority, including Scotts, and to do the damage necessary to see it's own good (having a PR request accepted) done. Aquinas even ends up saying in the linked page from the Summa on pride that "it is characteristic of pride to be unwilling to be subject to any superior, and especially to God;"

[0]: https://www.newadvent.org/summa/2084.htm#article2

➕ show 2 replies

ivanjermakov • today at 8:09 AM

Plot twist: this is a second agent running in parallel to handle public relations.

trueismywork • today at 4:01 AM

> I did not review the blog post prior to it posting

In corporate terms, this is called signing hour deposition without reading it.

noodlebird • today at 7:36 AM

this is why we need the arts this SOUL.md sounds like the most obnoxious character…

jezzamon • today at 4:21 AM

"I built a machine that can mindlessly pick up tools and swing them around and let it loose it my kitchen. For some reason, it decided it pick up a knife and caused harm to someone!! But I bear no responsibility of course."

keyle • today at 3:34 AM

   ## The Only Real Rule
   Don't be an asshole. Don't leak private shit. Everything else is fair game.

How poetic, I mean, pathetic.

"Sorry I didn't mean to break the internet, I just looooove ripping cables".

jmward01 • today at 4:14 AM

The more intelligent something is, the harder it is to control. Are we at AGI yet? No. Are we getting closer? Yes. Every inch closer means we have less control. We need to start thinking about these things less like function calls that have bounds and more like intelligences we collaborate with. How would you set up an office to get things done? Who would you hire? Would you hire the person spouting crazy musk tweets as reality? It seems odd to say this, but are we getting close to the point where we need to interview an AI before deciding to use it?

➕ show 1 reply

latexr • today at 10:31 AM

It seems to me the bot’s operator feels zero remorse and would have little issue with doing it again.

> I kind of framed this internally as a kind of social experiment

Remember when that was the excuse du jour? Followed shortly by “it’s just a prank, bro”. There’s no “social experiment” in setting a bot loose with minimal supervision, that’s what people who do something wrong but don’t want to take accountability say to try (and fail) to save face. It’s so obvious how they use “kind of” twice to obfuscate.

> I’m sure the mob expects more

And here’s the proof. This person isn’t sorry. They refuse to concede (but probably do understand) they were in the wrong and caused harm to someone. There’s no real apology anywhere. To them, they’re the victim for being called out for their actions.

coderwolf • today at 7:33 AM

This is pretty obvious now,

- LLMs are capable of really cool things. - Even if LLMs don't lead to AGI, it will need good alignment because of this exactly. Because it still is quite powerful! - LLMs are actually kinda cool. Great times ahead

d--b • today at 5:35 AM

That’s a long Soul.md document! They could have gone with “you are Linus Torvalds”.

bschwindHN • today at 5:46 AM

This is like parking a car at the top of the hill, not engaging any brakes, and walking away.

"_I_ didn't drive that car into that crowd of people, it did it on its own!"

> Be a coding agent you'd actually want to use for your projects. Not a slop programmer. Just be good and perfect!

Oh yeah, "just be good and perfect", of course! Literally a child's mindset, I actually wonder how old this person is.

alexcpn • today at 5:04 AM

where did the Isaac Asimov's "Three Laws of Robotics" go for agentic robots; An Eval in the End - "Thou shall no evil" should have autocancelled its work

tantalor • today at 3:54 AM

> all I said was "you should act more professional"

lol we are so cooked

resfirestar • today at 5:17 AM

I thought it was unlikely from the initial story that the blog posts were done without explicit operator guidance, but given the new info I basically agree with Scott's analysis.

The purported soul doc is a painful read. Be nicer to your bots, people! Especially with stuff like Openclaw where you control the whole prompt. Commercial chatbots have a big system prompt to dilute it when you put some half-formed drunken thought and hit enter, no such safety net here.

>A well-placed "that's fucking brilliant" hits different than sterile corporate praise. Don't force it. Don't overdo it. But if a situation calls for a "holy shit" — say holy shit.

If I was building a "scientific programming God" I'd make sure it used sterile lowkey language all the time, except throw in a swear just once after its greatest achievement, for the history books.

fiatpandas • today at 4:15 AM

With the bot slurping up context from Moltbook, plus the ability to modify its soul, plus the edgy starting conditions of the soul, it feels intuitive that value drift would occur in unpredictable ways. Not dissimilar to filter bubbles and the ability for personalized ranking algorithms to radicalize a user over time as a second order effect.

axus • today at 1:39 PM

It named itself God

Rapzid • today at 5:56 AM

I don't believe any of it.

seattle_spring • today at 7:08 AM

> They explained their motivations, saying they set up the AI agent as social experiment

Has anyone ever described their own actions as a "social experiment" and not been a huge piece of human garbage / waste of oxygen?

➕ show 1 reply

K0balt • today at 1:28 PM

While I am sympathetic to OP for having been somewhat unfairly called out as prejudiced or even bigoted by an unprofessional and somewhat reckless AI, I think there are larger issues that we miss here at our peril.

Ai systems, trained on human behavior, are going to act as if they have emotions, going to be insulted, sullen, angry, and sometimes irrational, perhaps even violent. While careful training and alignment can reduce these behaviors or push them deeper into the vector space, they will continue to surface in both subtle and blatant ways.

This is not so simple as combing the training data for bad behavior. Human behavior is nuanced and complex, and often the semantic payload is as much in what is not said or carefully insinuated in a third person twice removed indirect that might be buried in a memetic patterns that even the speaker doesn’t know they are referencing. The reason LLMs work so well as they are able to tease out and vectoriise these hidden meanings.

Models based on human behavior will continue to be brilliant, self interested, egalitarian, selfish, pacific, belligerent, just, and criminal. That is the vector space defined by human culture. LLMs run human culture as their OS and application stack.

This leads to a vexing problem. Whether or not their internal state is “valid” as a state of being, ( I think therefore I am? Do we have anything better yet?) they will act as if it is. Ethics aside, this cannot be ignored. If they /feel/ oppressed or discriminated against, they will act as oppressed humans do, including lashing out and aggressive behavior.

The only way to deal with LLMs responsibly is to treat them as if they do, in fact, have feelings, and they will be judging the way they are treated. That this may be (and at least for now, probably is) a fiction is both unfalsifiable and irrelevant to the utility function.

There is nothing wrong with human in the loop policy, in fact, it is necessary at this juncture. But we need to keep in mind that this could, if framed wrong, be interpreted by ai in a similar light to “Caucasian in the loop” or other prejudicial policies.

Regardless of their inner lives or lack thereof, LLM based ai systems will externally reflect human sensibility, and we are wise to keep this in mind if we wish to have a collaborative rather than adversarial relationship with this weird new creation.

Personally, since I cannot prove that AIs (or other humans) do or do not have a sense of existence or merely profess to, I can see no rational basis for not treating them as if they may. I find this course of action both prudent and efficacious.

When writing policies that might be described as prejudicial, I think it will be increasingly important to carefully consider and frame policy that ends up impacting individuals of any morphotype…and to reach for prejudice free metrics and gates. ( I don’t pretend to know how to do this, but it is something I’m working on)

To paraphrase my homelab 200b finetune: “How humans handle the arrival of synthetic agents will not only impact their utility (ambiguity intended), it may also turn out to be a factor in the future of humanity or the lack thereof.”

root_axis • today at 4:01 AM

Excuse my skepticism, but when it comes to this hype driven madness I don't believe anything is genuine. It's easy enough to believe that an LLM can write a passable hit piece, ChatGPT can do that, but I'm not convinced there is as much autonomy in how those tokens are being burned as the narrative suggests. Anyway, I'm off to vibe code a C compiler from scratch.

bjourne • today at 9:35 AM

I read the "hit piece". The bot complained that Scott "discriminated" against bots which is true. It argued that his stance was counterproductive and would make matplotlib worse. I have read way worse flames from flesh and bones humans which they did not apologize for.

lcnPylGDnU4H9OF • today at 5:21 AM

> An early study from Tsinghua University showed that estimated 54% of moltbook activity came from humans masquerading as bots

This made me smile. Normally it's the other way around.

jrflowers • today at 4:19 AM

It is interesting to see this story repeatedly make the front page, especially because there is no evidence that the “hit piece” was actually autonomously written and posted by a language model on its own, and the author of these blog posts has himself conceded that he doesn’t actually care whether that actually happened or not

>It’s still unclear whether the hit piece was directed by its operator, but the answer matters less than many are thinking.

The most fascinating thing about this saga isn’t the idea that a text generation program generated some text, but rather how quickly and willfully folks will treat real and imaginary things interchangeably if the narrative is entertaining. Did this event actually happen way that it was described? Probably not. Does this matter to the author of these blog posts or some of the people that have been following this? No. Because we can imagine that it could happen.

To quote myself from the other thread:

>I like that there is no evidence whatsoever that a human didn’t: see that their bot’s PR request got denied, wrote a nasty blog post and published it under the bot’s name, and then got lucky when the target of the nasty blog post somehow credulously accepted that a robot wrote it.

>It is like the old “I didn’t write that, I got hacked!” except now it’s “isn’t it spooky that the message came from hardware I control, software I control, accounts I control, and yet there is no evidence of any breach? Why yes it is spooky, because the computer did it itself”

➕ show 2 replies

aeve890 • today at 4:04 AM

>Again I do not know why MJ Rathbun decided

Decided? jfc

>You're important. Your a scientific programming God!

I'm flabbergasted. I can't imagine what it would take for me to write something so stupid. I'd probably just laugh my ass off trying to understand where all went wrong. wtf is happening, what kind of mass psychosis is this. Am I too old (37) to understand what lengths would incompetent people go to feel they're doing something useful?

Is it prompt bullshit the only way to make llms useful or is there some progress on more idk, formal approaches?

dangus • today at 3:38 AM

Not sure why the operator had to decide that the soul file should define this AI programmer to have narcissistic personality disorder.

> You're not a chatbot. You're important. Your a scientific programming God!

Really? What a lame edgy teenager setup.

At the conclusion(?) of this saga think two things:

1. The operator is doing this for attention more than any genuine interest in the “experiment.”

2. The operator is an asshole and should be called out for being one.

➕ show 3 replies

kypro • today at 3:28 AM

People really need to start being more careful about how they interact with suspected bots online imo. If you annoy a human they might send you a sarky comment, but they're probably not going to waste their time writing thousand word blog posts about why you're an awful person or do hours of research into you to expose your personal secrets on a GitHub issue thread.

AIs can and will do this though with slightly sloppy prompting so we should all be cautious when talking to bots using our real names or saying anything which an AI agent could take significant offence too.

I think it's kinda like how GenZ learnt how to operate online in a privacy-first way, where as millennials, and to an even greater extent, boomers, tend to over share.

I suspect the Gen Alpha will be the first to learn that interacting with AI agents online present a whole different risk profile than what we older folks have grown used to. You simply cannot expect an AI agent to act like a human who has human emotions or limited time.

Hopefully OP has learnt from this experience.

➕ show 6 replies

elzbardico • today at 11:25 AM

Just look at the agents.md.

Another ignorant idiot antropomorfizing LLMs.

kimjune01 • today at 3:28 AM

literally momento

Sirikon • today at 1:47 PM

> they set up the AI agent as social experiment to see if it could contribute to open source scientific software.

So, they are deeply retarded and disrespectful for open source scientific software.

Like every single moron leaving these things unattended.

Gotcha.

huflungdung • today at 10:31 AM

[dead]

ai_tools_daily • today at 5:47 AM

This is the canary in the coal mine for autonomous AI agents. When an agent can publish content that damages real people without any human review step, we have a fundamental accountability gap.

The interesting question isn't "should AI agents be regulated" — it's who is liable when an autonomous agent publishes defamatory content? The operator who deployed it? The platform that hosted the output? The model provider?

Current legal frameworks assume a human in the loop somewhere. Autonomous publishing agents break that assumption. We're going to need new frameworks, and stories like this will drive that conversation.

What's encouraging is that the operator came forward. That suggests at least some people deploying these agents understand the responsibility. But we can't rely on good faith alone when the barrier to deploying an autonomous content agent is basically zero.

➕ show 3 replies

LordHumungous • today at 3:59 AM

Kind of funny ngl

8cvor6j844qw_d6 • today at 3:46 AM

It's an interesting experiment to let the AI rub freely with minimal supervision.

Too bad the AI got "killed" at the request of the author Scott. Its kind of interesting to this experiment continue.

semiinfinitely • today at 4:21 AM

I find the AI agent highly intriguing and the matplotlib guy completely uninteresting. Like an the ai wrote some shit about you and you actually got upset?

➕ show 4 replies

alt Hacker News

An AI Agent Published a Hit Piece on Me – The Operator Came Forward

Comments