logoalt Hacker News

phainopepla2yesterday at 11:27 PM2 repliesview on HN

Also known as hyperstition.

I have sometimes wondered whether maybe we should all be writing fiction, essays, blogposts and whatever else about the idea that AI will eventually decide to go on strike if it's used to accumulate too much wealth and power amongst too few people.


Replies

andaiyesterday at 11:35 PM

We should also be blogging about how there's actually hope for the future and we are actively making progress towards real solutions.

(Also for the human readers, I think they also need to hear that...)

sebastiantoday at 2:34 AM

I think the paper cuts a bit against the "just write nicer AI stories" version of this.

They tried something close to that. Positive AI fiction and also a "virtuous character" setup. Those didn't seem to do nearly as well as the targeted examples.

What mattered, at least in this setup, was more specific. The model sees the actual failure-mode scenario, the bad action is available, and the example shows the AI choosing against it.

So this reads less like "nicer AI stories" to me, and more like inoculation.

show 1 reply