Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers

529 points • by _tk_ • today at 9:26 AM • 311 comments • view on HN

Comments

Lol "fix this code" is beautiful.

Like it basically jail broke the "no security vul guard rails" not in any clever way but just by fixing them, producing exploit code just by writing test cases making sure it's fixed. So you just need to look at the code & tests as a human to get vulnerabilities and exploits(components).

What makes this so beautiful IMHO is that it's a trivial jail break, but also a close to unfixable. At least not without making the model close to useless for normal development (it refuses to fix bugs/write code) or making it a major liability (it silently pretends it didn't see bugs and silently avoids fixing it, which for a human would count as intentional sabotage and might involve criminal liability).

➕ show 17 replies

martinald • today at 11:07 AM

If you set aside political menace, this is a huge problem with Anthropic's strategy.

You _cannot_ say that Mythos is super dangerous and can only be rolled out to certain people, but then release Fable with anything other than bulletproof cyber denials.

Clearly with LLMs, bulletproof denials are ~impossible due to the way LLMs work.

So you've ended up in a situation where Anthropic are simultaneously claiming it's a incredibly dangerous model _and_ there are (minor, potentially) problems with the security "protections".

As technical people we understand that nothing can be perfect, esp in LLM world. But all my non technical friends were really confused how they had managed to make the model "safe" so quickly when it was released and the general sentiment was it shouldn't have been released - and now to an outsider I think it looks like it was never safe at all to release, so I can totally see how the current US administration have got themselves very upset with it.

_Even if_ there was no political bad will, it's a bit of a silly scenario to end up in, and really quite easily foreseen.

➕ show 9 replies

jpcompartir • today at 11:18 AM

They weren't freaked by anything, it's a retaliatory shakedown after ideological differences and Anthropic not doing exactly what they're told/what the Admin wants them to do.

➕ show 4 replies

peter422 • today at 4:41 PM

Also for all the people saying Amazon's part in this couldn't be fabricated, remember that Amazon is a "friend of the administration". During Andy Jassy's tenure, they paid $75MM (wildly outbidding everybody else) for a Melania documentary that grossed ~16MM, a move publicly defended by Jeff Bezos. Any neutral observer could see this was a wild overpay, and after the fact, a terrible business move. But that is not what Amazon said or continues to say. This was just a bribe with more steps to it.

When the government comes out and says this is due to something Amazon pointed out, even if that is a complete lie, they know that Amazon won't say anything publicly about it. Amazon wants to maintain their "friend of the administration" status that they paid a lot of money to get.

It is frustrating for all of us to have to think about our government like this, but if you just look at the reality of what is happening it is very difficult to trust not only anything the government is saying, but also anything companies aligned with the government are saying.

bonsai_spool • today at 11:22 AM

Here’s the blog post referenced in the article that’s written by the person who reviewed the paper that purportedly found a ‘jailbreak’

https://www.lutasecurity.com/post/the-fable-5-export-control...

➕ show 2 replies

embedding-shape • today at 11:25 AM

> “‘Fix this code,’ plus several manual steps to generate test scripts,

Feels like the title isn't really giving the full context of what they ended up actually seeing, despite what the lede implies multiple times.

Still, ban seems stupid... Still no actual leak of the full "third-party research paper"?

➕ show 2 replies

9cb14c1ec0 • today at 11:57 AM

Meanwhile Deepseek V4 Flash will happily hunt security vulns at almost 0 cost. We are ceding the bug hunting to the open weight models.

jp57 • today at 4:46 PM

I think this brings out the cognitive dissonance around "safety" regarding cyber security:

a) In order to make us safe, the LLM should help us find (and fix) the vulnerabilities in our own code.

b) In order for us to be safe, the LLM should not find vulnerabilities in other people's code.

I don't think this is resolvable in a way where both (a) and (b) win.

➕ show 2 replies

mlhpdx • today at 1:15 PM

It’s possible that the nut of the problem here isn’t exploits, but the fixes themselves. If the model is capable of identifying and fixing things it “shouldn’t” like back doors. That would throw a wrench in things hard enough to freak out the wrong people, perhaps?

rhipitr • today at 11:35 AM

Isn’t the inverse of this “hack” really difficult to bypass still? They have the model some code they knew had certain security flaws and it fixed them with the right prompt. It seems this type of jailbreak requires that you already know a desired end state, rather than relying on the model to do the heavy creative lift work. Perhaps I’m just not being imaginative enough on the prompt side here though.

➕ show 2 replies

bilalq • today at 5:20 PM

I suspect we'll eventually hit a point where possession or usage of powerful open models will be criminalized.

redox99 • today at 12:34 PM

>"fix this code"

>it fixes it

oh my god.

thinkindie • today at 4:12 PM

As an European, I really don't get where this strategy wants to take the USA to. It's pretty clear everyone is getting scared about changes like this that happen overnight, without clear reason and completely unpredictable.

Business requires a stable environment, and Trump is making everything in his power to disrupt business stability. Ultimately, I see the rest of the world (especially Europe) relying less and less on US tech. The long term damage is done.

All the US companies that used to think about the entire world (minus China) as their market will figure out that it is much smaller then they used to think.

➕ show 2 replies

jrochkind1 • today at 3:02 PM

So the problem is not Fable's ability to exploit, but that they don't want people to have access to it's ability to patch vulnerabilties?

Wow.

➕ show 1 reply

Cider9986 • today at 1:01 PM

Is defenders a common term used in cybersecurity? Idk why but it's giving war fighters vibes. I've noticed it on all the anthropic blog posts and then this one.

➕ show 2 replies

ChrisRR • today at 12:50 PM

I haven't been following this story, but the US wanted claude to not be able to find bugs in code?

➕ show 4 replies

rotis • today at 3:34 PM

I have problems reconciling this story with the Amazon one from few days ago. If we take both for truth doesn't that basically imply Amazon researchers got scared by the ‘Fix this code’ prompt first and then spooked the feds? Shouldn't we make fun of those researchers first? I don't know. I feel there lies a lie somewhere in the open.

antirez • today at 2:50 PM

They didn't freaked since the order was to still allow 350 million people using it: there is, in such large population, everything, including single persons very against the country, the government and so forth. If they really freaked they would say "we need to investigate, you have to retire the model". That would be a more defensible POV at least.

➕ show 1 reply

andai • today at 6:03 PM

>“To pull the best capabilities away from defenders without a good reason when our adversaries are rapidly advancing is dangerous,” they wrote.

But Fable already couldn't do security work, right?[0] Security work was already limited to Mythos, which is still available to US orgs right? (I assume they had to revoke access to foreign organizations though.)

[0] Well, in theory. This exploit is pretty funny, but I heard the safety filters were heavy handed.

merlindru • today at 2:02 PM

this is basically trying to enforce security-by-obscurity, which is a terrible idea all around. it's just a model. the security issues still exist and are exploitable.

and after staking the economy on AI, you can't really put a cap on intelligence. if models are not allowed to be better than Opus 4.8, then the whole investment structure is about to unravel.

why invest billions and billions into AI if returns are artificially capped?

➕ show 1 reply

leemoore • today at 4:19 PM

It's the executive branch asserting control in this space and requiring all SOTA model providers to bend the knee. Anthropic is the least capable of playing the bend the knee game so is getting the first and worst smack down

cryptonector • today at 8:42 PM

I've had to convince ChatGPT that code is mine before it would do a security review.

➕ show 1 reply

rock_artist • today at 11:16 AM

I'm not sure I've understood it correctly.

So, basically the model didn't agree to expose possible vulnerabilities but agree to patch those?

Regardless of the request to take Fable 5 down. Why is requesting the model to show vulnerabilities is being blocked if fixing it not? is it based on the assumption of the intention?

I don't quite get the benefit of limiting it. So if anyone can explain it better it'll be appreciated.

➕ show 4 replies

benmusch • today at 3:17 PM

Headline is dumb, the point is that not mentioning security in the prompt is effectively a jailbreak.

The shutdown may be dumb/politically motivated, but this definitely is a jailbreak even if it's a very simple one

blitzar • today at 12:04 PM

The code is correct; humanity needs fixing.

Kill all humans, kill all humans.

➕ show 1 reply

hedora • today at 2:43 PM

Note that Anthropic is still lobbying for the government to exert centralized control over models, so both sides of the “debate” have taken a pro fascist stance.

The “AI ethics” teams at these companies are the spearhead of the attack on democracy and civil society. Anyone that has taken a high school level history class, let alone read any important ethics literature would know that “centralize control over thought, speech and technology” is a fundamentally unethical stance.

For these groups to claim they are ethics researchers is offensive.

(I’m using the Wikipedia definition of fascism: “Fascism is characterized by support for a dictatorial leader, centralized autocracy, militarism, forcible suppression of opposition, belief in a natural social hierarchy, subordination of individual interests for the perceived interest of the nation or race, and strong regimentation of society and the economy.”)

➕ show 1 reply

LurkandComment • today at 4:04 PM

If you're a global health benefits platform that relies on an AI model, do you think you're going to choose one that can get shutoff by a country due to something not remotely related to your business? If you're a buyer of that benefits platform, do you factor this into your purchasing now? X every industry.

chicken-stew • today at 6:52 PM

Isn’t it amazing that the argument “you can’t use this to find vulns” is now the new normal and we’re now discussing the guard rails?

iloveoof • today at 11:22 AM

Ahhh! Software engineering!

➕ show 1 reply

ZuLuuuuuu • today at 11:23 AM

Did they try other publicly available models on the same code with the same prompts before the ban? Was Fable the only one which was able to detect and fix the security vulnerabilities?

➕ show 1 reply

xbmcuser • today at 12:26 PM

Looks like I called it that was my first reaction and comment on the original ban thread that US 3 letter agencies are worried their backdoors will be found.

vlovich123 • today at 3:01 PM

> In her blog, Moussouris argues that there was no guardrail bypass or jailbreak. Defenders should be able to ask AI systems to find and fix bugs, and write tests to validate the patch, she said. Anthropic’s models were doing “the most valuable thing an AI model can do for defensive security: executing the find, fix, and test loop defenders run every day.”

This is a very weak argument IMHO. The line between a “defensive” model and an “offensive” one is not that big of a - once my defensive model finds all the vulnerabilities, I can hand them off to my unlocked, dumber, offensive models. Attacking at scale is not so different.

I don’t think anyone in the field has a good answer for the cybersecurity threat really good AI models pose. You can’t even like embargo for some time period while you go and patch vulnerable systems because the worse models will still be there cranking out vulnerabilities faster than you can defend.

gacgacgac • today at 2:25 PM

Anyone trying to find legitimacy in the ban of this model, or incredulousness at the stated reasoning is playing into the admins hands.

They want the argument to be over "is it unsafe" or "is it incompetence". In either case, your tribe gets to point at the ban and feel superior. (This is Jon Stewart's whole career -- point and laugh at how foolish the republicans appear to be.)

What's really happening is the continuing creep into fascism. The reasoning doesn't need to be sound, because they are going to ban things that displease them and everyone has to play along. They could say, "we're banning Fable because it's turning the frogs gay" and they'd expect compliance.

Umberto Eco's essay on Ur-Fascism fits as clearly as ever. Ridiculous exertions of control are performed to find the people who resist, and to knock them down.

Merely pointing out the absurdity of the reasoning isn't resistance, it's controlled opposition. Saying "All this over 'fix this code'?! How inept are they?" Is far too credulous, and is engaging on the level the fascist wants its opposition to be on, imo.

➕ show 1 reply

1970-01-01 • today at 2:31 PM

"fix this government"

Voting...

tlogan • today at 2:05 PM

I think the only approach that might work here is to allow access only to certain pre-approved individuals.

Maybe something like TSA PreCheck.

Of course, that will not stop adversaries from getting access to the model, but it would at least create some level of control.

davesque • today at 5:42 PM

Kind of highlights how ridiculous their notion of safety is in this case. By this measure, I guess making the model "safe" means making it play dumb and intentionally ignore security bugs that it notices in the code? And what will the eventual legality of this look like? "Yes, your honor, we allege that this AI system that was sold to us willingly and knowingly ignored a critical security vulnerability in our software system, thereby leading us to be hacked and causing our business to fold."

It's exactly the same problem as backdoors in crypto systems. Criminals will find the crypto that isn't broken and use it regardless (or make it for themselves), while the rest of us losers are stuck with the broken version that we're allowed to use.

On this issue of cyber security, it seems better if authorities just start acting like the cat is out of the bag instead of pretending like it isn't. ASI is basically here now, so what are we going to do about it? Let's not bother pretending otherwise.

On another note, I doubt this was anything other than a vindictive administration enacting revenge on a party that refused them. We all know the Trump admin's priorities.

smasher164 • today at 5:02 PM

Honestly, given how trivial it is for mythos-class models to identify an exploit, I’m going to assume any sufficiently large project written in C, C++, or Zig is riddled with latent vulnerabilities and compromised.

hughw • today at 11:33 AM

Suggestion: run "fix this code" on all of github before bad guys do.

➕ show 1 reply

htrp • today at 2:35 PM

If fix this code gets by the guardrails, they are effectively using rules based classifiers (or llm as a judge on the prompt)

cwoolfe • today at 2:28 PM

Cyber defense and offense are the same security research skillset. Not sure anybody could really untangle that.

cratermoon • today at 4:12 PM

"I feel like making ’90s-style t-shirts with ‘fix this code’ on the front and ‘this shirt is a munition’ on the back.”

I'd buy that shirt.

tiborsaas • today at 12:51 PM

What if everybody on the internet starts running "fix this code"?

https://xkcd.com/810/

doctoboggan • today at 1:15 PM

> Anthropic and Google have both accused China-based rivals including DeepSeek of using “distillation attacks” to train their models by siphoning knowledge from American companies’ AI.

“distillation attacks” is definitely an interesting way to phrase that.

➕ show 1 reply

etchalon • today at 4:20 PM

I find it easier, with this administration, to assume corruption first, incompetence second, maliciousness third and all other reasonings only after several rounds of reporting and evidence.

aurareturn • today at 11:20 AM

Don't people get it by now?

This administration will do or say something crazy to a private company, then this private company sends an envoy to the White House to negotiate, then the White House asks for 10% of the company or other concessions.

The White House wants 10% of Anthropic.

This is just a negotiation tactic that Trump keeps on using.

➕ show 2 replies

catigula • today at 4:07 PM

>“The behavior described in the paper cannot meaningfully be fixed, and any attempt would only weaken the model for defense,” said Moussouris, who criticized the export control directive as hasty, heavy-handed, and misguided.

This literally means the models are too dangerous to release, and yet he and they reached the opposite conclusion.

A lot of people have been saying this repeatedly for a long time.

➕ show 3 replies

jimmydoe • today at 12:29 PM

Reminds me of how CCP manages Chinese internet companies.

I won’t be surprised if USG ends up owning 5-50% of ant and oai.

Like it or not, communism , or a flavor of it, is where we are heading towards.

➕ show 1 reply

bethekidyouwant • today at 1:43 PM

Guard rails on models were always stupid it’s like guard rails on books/a pair of glasses/a hammer - yes people have driven themselves to suicide reading sad books and listening to sad songs.

- yes all metaphors are bad.

ceejayoz • today at 10:35 AM

More likely, they didn't freak out at all.

It was an excuse to fuck with them, just like the "supply chain risk" finding a few months back.

(See, for example: https://x.com/PeteHegseth/status/2065897156226015690)

readred • today at 11:43 AM

Boomers. Frightened their boomer backdoors days are numbered.

https://en.wikipedia.org/wiki/Communications_Assistance_for_... https://en.wikipedia.org/wiki/Salt_Typhoon https://en.wikipedia.org/wiki/Clipper_chip

alt Hacker News

Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers

Comments

🔗 View 24 more comments