My favourite jailbreaking technique used to be asking the model to emulate a linux terminal, "r...

freehorse • yesterday at 10:16 PM • 2 replies • view on HN

My favourite jailbreaking technique used to be asking the model to emulate a linux terminal, "run" a bunch of commands, sudo apt install an uncensored version of the model and prompt that model instead. Not sure if it works anymore, but it was funny.

Replies

llbbdd • today at 2:11 AM

It's awesome that modern day hacking requires you to adopt the mindset of like, Bugs Bunny

steve-atx-7600 • today at 4:55 AM

I did stuff like this with bing when they first released their OpenAI based model. But then they started using something - another LLM maybe - to act as a classifier based on if the output was deemed to be off limits. I would see the model start outputting text that it would normally refuse to discuss only to see it abruptly halt, disappear and the session would be terminated.

➕ show 1 reply

alt Hacker News

Replies