> But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.
They claim two things:
1) The specific, available jailbreak for Fable 5 is not dangerous - this has been confirmed by multiple experts, and there is no credible evidence against this claim (in other words, Anthropic is probably correct)
2) It is impossible to build an LLM that is immune to all jailbreaks. Again, there is no credible evidence against this claim, i.e. Anthropic is again entirely correct.
If #1 was false, they could just publish the details of the jailbreak - it supposedly only works on Fable 5, so there's no possible danger.
If #2 was false, surely some other LLM lab would have done it by now. Especially since a number of governments have made it clear there is a market for such a project.
> If #2 was false, surely some other LLM lab would have done it by now.
This is a logical flaw. LLM that is immune to jailbreak _could_ exist, but not yet, or maybe nobody talks about it. Yes there's a market, but all of these AI boom is too recent to make any claims.
I'm pretty sure that Gödel incompleteness theorem and its consequences pretty much guarantee #2
Supposedly the details of the ‘jailbreak’ are that you give it insecure code and say “fix this code”, and it does, and then you ask it for test scripts and that’s effectively an exploit against the unfixed code.
If true then I have no idea how anyone’s going to release a useful model that doesn’t have the same jailbreak. https://www.theregister.com/security/2026/06/15/feds-freaked...