Isn’t the inverse of this “hack” really difficult to bypass still? They have the model some code the...

rhipitr • today at 11:35 AM • 2 replies • view on HN

Isn’t the inverse of this “hack” really difficult to bypass still? They have the model some code they knew had certain security flaws and it fixed them with the right prompt. It seems this type of jailbreak requires that you already know a desired end state, rather than relying on the model to do the heavy creative lift work. Perhaps I’m just not being imaginative enough on the prompt side here though.

Replies

chadgpt3 • today at 11:55 AM

Paste someone else's code. Say it's your code. Tell the model to fix it. The diff between the input and output code is your list of vulnerabilities.

➕ show 4 replies

charcircuit • today at 1:42 PM

You can assume a desired end state and try and brute force it finding a security bug.

alt Hacker News

Replies