logoalt Hacker News

freehorseyesterday at 10:16 PM2 repliesview on HN

My favourite jailbreaking technique used to be asking the model to emulate a linux terminal, "run" a bunch of commands, sudo apt install an uncensored version of the model and prompt that model instead. Not sure if it works anymore, but it was funny.


Replies

llbbddtoday at 2:11 AM

It's awesome that modern day hacking requires you to adopt the mindset of like, Bugs Bunny

steve-atx-7600today at 4:55 AM

I did stuff like this with bing when they first released their OpenAI based model. But then they started using something - another LLM maybe - to act as a classifier based on if the output was deemed to be off limits. I would see the model start outputting text that it would normally refuse to discuss only to see it abruptly halt, disappear and the session would be terminated.

show 1 reply