logoalt Hacker News

zozbot234yesterday at 9:38 PM1 replyview on HN

> First of all I found that fable is trained in a way that even if you were to jailbreak it, it would be completely uninterested in exploitation or finding creative solutions for explotation.

This is quite relevant if true. People have tried to argue for this restriction by claiming the exact opposite, i.e. that a basic jailbreak of Fable immediately exposes Mythos's cyber offense capabilities. E.g. https://news.ycombinator.com/item?id=48519695 It makes a lot of sense that Fable would also be fine-tuned or steered away from cyber offense topics, since they're reasonably easy to identify and Anthropic has demonstrated this capability wrt. other stuff.


Replies

himata4113yesterday at 9:48 PM

I mean it's possible that I just haven't found the secret sauce or I'm running into the invisible guardrails and that people have much stronger jailbreaks than I do.

However, I would not rule out openai involvement in all of this.

show 2 replies