logoalt Hacker News

zipy124yesterday at 11:37 AM4 repliesview on HN

What's surprising to me is that anyone who has a CS education thinking that jailbreaks are not trivial. It is as simple as normal algorithmic reduction [1], e.g can I transform a dangerous task into a not-dangerous task that the LLM will agree to solve, and then re-transform back.

[1]: https://en.wikipedia.org/wiki/Reduction_(complexity)


Replies

Retr0idyesterday at 12:00 PM

Something being possible doesn't mean it's easy. Transforming a problem from a forbidden shape into an allowed shape could well be harder than just solving the original problem.

show 2 replies
isodevyesterday at 12:04 PM

The movie M3GAN 2.0 had the exact same plot twist. The kid in the movie even explains outloud what the bot had to do to deal with the limitation. So in other words, since 2025, even teens know this "sandboxing the LLM by layering prompts" thing is never going to work.

NiloCKyesterday at 12:41 PM

I think that as simple as is doing a lot of work when the problem domain is all natural language (or more - all strings?) rather than some well specified DSA problem.

show 1 reply
ReptileManyesterday at 11:58 AM

New discipline - homomorphic prompting.