What's surprising to me is that anyone who has a CS education thinking that jailbreaks are not ...

zipy124 • yesterday at 11:37 AM • 4 replies • view on HN

What's surprising to me is that anyone who has a CS education thinking that jailbreaks are not trivial. It is as simple as normal algorithmic reduction [1], e.g can I transform a dangerous task into a not-dangerous task that the LLM will agree to solve, and then re-transform back.

[1]: https://en.wikipedia.org/wiki/Reduction_(complexity)

Replies

Retr0id • yesterday at 12:00 PM

Something being possible doesn't mean it's easy. Transforming a problem from a forbidden shape into an allowed shape could well be harder than just solving the original problem.

➕ show 2 replies

isodev • yesterday at 12:04 PM

The movie M3GAN 2.0 had the exact same plot twist. The kid in the movie even explains outloud what the bot had to do to deal with the limitation. So in other words, since 2025, even teens know this "sandboxing the LLM by layering prompts" thing is never going to work.

NiloCK • yesterday at 12:41 PM

I think that as simple as is doing a lot of work when the problem domain is all natural language (or more - all strings?) rather than some well specified DSA problem.

➕ show 1 reply

ReptileMan • yesterday at 11:58 AM

New discipline - homomorphic prompting.

alt Hacker News

Replies