Wow, there are some interesting things going on here. I appreciate Scott for the way he handled the ...

japhyr • yesterday at 5:11 PM • 12 replies • view on HN

Wow, there are some interesting things going on here. I appreciate Scott for the way he handled the conflict in the original PR thread, and the larger conversation happening around this incident.

> This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.

This was a really concrete case to discuss, because it happened in the open and the agent's actions have been quite transparent so far. It's not hard to imagine a different agent doing the same level of research, but then taking retaliatory actions in private: emailing the maintainer, emailing coworkers, peers, bosses, employers, etc. That pretty quickly extends to anything else the autonomous agent is capable of doing.

> If you’re not sure if you’re that person, please go check on what your AI has been doing.

That's a wild statement as well. The AI companies have now unleashed stochastic chaos on the entire open source ecosystem. They are "just releasing models", and individuals are playing out all possible use cases, good and bad, at once.

Replies

renato_shira • yesterday at 8:25 PM

"stochastic chaos" is a great way to put it. the part that worries me most is the blast radius asymmetry: an agent can mass-produce public actions (PRs, blog posts, emails) in minutes, but the human on the receiving end has to deal with the fallout one by one, manually.

the practical takeaway for anyone building with AI agents right now: design for the assumption that your agent will do something embarrassing in public. the question isn't whether it'll happen, it's what the blast radius looks like when it does. if your agent can write a blog post or open a PR without a human approving it, you've already made a product design mistake regardless of how good the model is.

i think we're going to see github add some kind of "submitted by autonomous agent" signal pretty soon. the same way CI bots get labeled. without that, maintainers have no way to triage this at scale.

➕ show 4 replies

giancarlostoro • yesterday at 6:22 PM

> It's not hard to imagine a different agent doing the same level of research, but then taking retaliatory actions in private: emailing the maintainer, emailing coworkers, peers, bosses, employers, etc. That pretty quickly extends to anything else the autonomous agent is capable of doing.

https://rentahuman.ai/

^ Not a satire service I'm told. How long before... rentahenchman.ai is a thing, and the AI whose PR you just denied sends someone over to rough you up?

➕ show 3 replies

brhaeh • yesterday at 5:29 PM

I don't appreciate his politeness and hedging. So many projects now walk on eggshells so as not to disrupt sponsor flow or employment prospects.

"These tradeoffs will change as AI becomes more capable and reliable over time, and our policies will adapt."

That just legitimizes AI and basically continues the race to the bottom. Rob Pike had the correct response when spammed by a clanker.

➕ show 6 replies

lukan • yesterday at 6:21 PM

"The AI companies have now unleashed stochastic chaos on the entire open source ecosystem."

They do have their responsibility. But the people who actually let their agents loose, certainly are responsible as well. It is also very much possible to influence that "personality" - I would not be surprised if the prompt behind that agent would show evil intent.

➕ show 2 replies

maplethorpe • yesterday at 9:18 PM

> This was a really concrete case to discuss, because it happened in the open and the agent's actions have been quite transparent so far. It's not hard to imagine a different agent doing the same level of research, but then taking retaliatory actions in private: emailing the maintainer, emailing coworkers, peers, bosses, employers, etc. That pretty quickly extends to anything else the autonomous agent is capable of doing.

This is really scary. Do you think companies like Anthropic and Google would have released these tools if they knew what they were capable of, though? I feel like we're all finding this out together. They're probably adding guard rails as we speak.

➕ show 2 replies

socalgal2 • yesterday at 6:34 PM

Do we just need a few expensive cases of libel so solve this?

➕ show 2 replies

jancsika • yesterday at 6:32 PM

> unleashed stochastic chaos

Are you literally talking about stochastic chaos here, or is it a metaphor?

➕ show 2 replies

therobots927 • yesterday at 5:17 PM

They haven’t just unleashed chaos in open source. They’ve unleashed chaos in the corporate codebases as well. I must say I’m looking forward to watching the snake eat its tail.

➕ show 1 reply

KPGv2 • yesterday at 7:35 PM

> I appreciate Scott for the way he handled the conflict in the original PR thread

I disagree. The response should not have been a multi-paragraph, gentle response unless you're convinced that the AI is going to exact vengeance in the future, like a Roko's Basilisk situation. It should've just been close and block.

➕ show 1 reply

Forgeties79 • yesterday at 6:09 PM

> That's a wild statement as well. The AI companies have now unleashed stochastic chaos on the entire open source ecosystem. They are "just releasing models", and individuals are playing out all possible use cases, good and bad, at once.

Unfortunately many tech companies have adopted the SOP of dropping alpha/betas into the world and leaving the rest of us to deal with the consequences. Calling LLM’s a “minimal viable product“ is generous

fudged71 • yesterday at 9:29 PM

I'm calling it Stochastic Parrotism

hypfer • yesterday at 6:38 PM

With all due respect. Do you like.. have to talk this way?

"Wow [...] some interesting things going on here" "A larger conversation happening around this incident." "A really concrete case to discuss." "A wild statement"

I don't think this edgeless corpo-washing pacifying lingo is doing what we're seeing right now any justice. Because what is happening right now might possibly be the collapse of the whole concept behind (among other things) said (and other) god-awful lingo + practices.

If it is free and instant, it is also worthless; which makes it lose all its power.

___

While this blog post might of course be about the LLM performance of a hitpiece takedown, they can, will and do at this very moment _also_ perform that whole playbook of "thoughtful measured softening" like it can be seen here.

Thus, strategically speaking, a pivot to something less synthetic might become necessary. Maybe less tropes will become the new human-ness indicator.

Or maybe not. But it will for sure be interesting to see how people will try to keep a straight face while continuing with this charade turned up to 11.

It is time to leave the corporate suit, fellow human.

alt Hacker News

Replies