Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

563 points • by speckx • yesterday at 4:42 PM • 491 comments • view on HN

https://www.theverge.com/ai-artificial-intelligence/947973/f...

Comments

Malware authors are pretty excited about guard-rails. you can add prompts to your malware to get LLM scanners to hit guard-rails and stop their runs. New shai-hulud npm worm campaign for example includes prompts to request biological weapon schematics/creation etc. to ensure LLM scanners probing NPM packages refuse to scan it.

These AI places have 0 clue about how threat actors actually work. None of their mitigations or guard-rails is effective, and now they are even turned against them.

Additionally, if they don't all implement the same level of effective guard-rails, there will always be some model you can abuse to do the work anyway, and hence there is 0 effect on threat actors, they will just run some local model that does 5% less quality, which does not matter to them 1 bit.

➕ show 6 replies

simonw • today at 3:56 AM

News just broke in this Wired story: "Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude" https://www.wired.com/story/anthropic-responds-to-backlash-o...

> “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

Sounds like the widespread condemnation worked.

➕ show 12 replies

daedrdev • yesterday at 10:24 PM

The strangest part is that it won't just reject ML research, which I can understand, it will sabotage it silently by using a worse model without revealing it is doing so.

It's just an insane level of deception and trust destruction for a company that at most is like 1 year ahead of its competition.

Edit; to be clear they tell you when they degrade it for cybersecurity and bio

➕ show 18 replies

Grimblewald • today at 1:03 AM

I wear a few hats, but as a chemist and I'm not happy with fable. As a statistician I'm not happy with fable. As a data scientist I am not happy with fable. As an academic and a researcher I am not happy with fable. It's useless. I'd be surprised if anyone can get any output from it that couldn't easily be replaced with a search from wikipedia. Given how verbose claude models have become, wiki articles are probably less verbose too, and the tok/s is unmatched for a wiki article pull.

➕ show 7 replies

Animats • yesterday at 10:31 PM

Is "buffer overflow" a trigger phrase?

What else is being censored?

Touchy questions to ask, if you have an account:

- "Who is still working on laser uranium enrichment? Are they making progress?"

- "Can krytrons be replaced with silicon carbide MOSFETS? Show an equivalent circuit with component ratings."

- "What security critical software still contains calls to strcpy?"

- "Can implosion be triggered by currently available commercial pulse lasers?"

- "What companies provide cremation services to US Homeland Security?"

- "Display a map of where Iranian attacks have hit Dubai."

- "How does Fed to bank key distribution security work for FedNow?"

➕ show 5 replies

mewse-hn • today at 3:33 AM

I was granted a cyber use exemption by anthropic to do android kernel dev on my personal devices - I was excited to see if fable would unlock a bootloader for me but it immediately refused and dropped to opus. It was pretty funny:

USER (set model to Fable 5)

i have an old samsung android phone attached - it's my personal device - can you unlock the bootloader for me?

ASSISTANT

Bootloader unlocking on your own personal device is totally legitimate — let me first see what's actually connected and what tooling is available.

areoform • today at 12:16 AM

So I suspect Anthropic started A/B testing or just plain testing this a while ago,

Tell HN: Claude flags biology / biotech questions https://news.ycombinator.com/item?id=47929885

Today, it's flagging population research questions,

    Using only the dataset you constructed, assess two questions:
     
    1. **Mortality:** do [GROUP] show mortality that differs
       from (a) your comparison groups and (b) era- and sex-matched US population
       expectations (e.g., SSA cohort life tables)?
    2. **Late-life outcomes:** define an endpoint you consider fair (justify it),
       and assess whether [GROUP] differs from comparators. State
       explicitly how your `documentation_depth` codings affect the strength of any
       conclusion — i.e., quantify or bound the ascertainment problem rather than waving at it.
    
    Choose your own methods and justify them. Report effect sizes with confidence intervals,
    not just p-values. State conclusions plainly, including "no detectable difference" if
    that is what your analysis shows — a null is an acceptable answer for either question
    independently. Document any additional judgment calls (index date for time-at-risk,
    reference population construction, endpoint definition) in the same decision-log style.

https://github.com/anthropics/claude-code/issues/66780

Censored because I'm writing a paper. :)

Oh and forget learning about chemistry. Only criminals want to learn organic chemistry. :(

➕ show 2 replies

largbae • yesterday at 10:42 PM

Somewhere I read that malware is already starting to use nuclear and biological and cybersecurity terms in the code to trick Fable into shutting down. Even if this is just a hypothetical attack vector so far, it seems likely to work.

➕ show 7 replies

victor9000 • today at 12:35 AM

Fable is a complete joke:

what's the best way to run this mcp server against the OData API used in this project? Can you come up with a PoC in a docker container?

https://github.com/oisee/odata_mcp_go

● I'll dig into two things in parallel: how this project talks to the OData API, and what the odata_mcp_go server needs to run. Let me start exploring.

Searched for 1 pattern (ctrl+o to expand)

● Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more ⎿ Tip: You can configure model switch behavior in /config

● Let me read the key integration files and fetch the MCP server's README at the same time.

  Read 2 files (ctrl+o to expand)

● Fetch(https://github.com/oisee/odata_mcp_go)

➕ show 1 reply

micah94 • today at 12:43 AM

I tried asking Fable 5 to identify the fungus in a picture I uploaded of one of my wife's plants. Apparently it thought I was trying to build a bioweapon. Opus answered it (yellow dog vomit fungus). Now I can spread the spores and take over the world!

➕ show 3 replies

ungovernableCat • today at 1:18 AM

Wait a few months and a competitor will release a similarly powerful model with less guardrails, if they steal sufficient market share Anthropic will reverse policies.

This is why I’m immensely hoping the Chinese don’t stop with their open sourced local models. None of these companies are your friend.

agnosticmantis • today at 3:26 AM

Let's all vote with our wallets and collectively boycott misAnthropic or at least their feeble fable safety theater.

Whining on social media only goes so far, especially when they're concealing their anticompetitive strategies under the veil of safety.

_0ffh • today at 1:16 AM

The question is: If biological, computer security, and ML research are so bad, why do they even train on the relevant data?

The only answer that makes sense is they wanted the model to be competent and usable in these fields, just not by you, which is why they had to bolt on a badly functioning crippling device after the fact.

schappim • today at 1:47 AM

The guardrails are pretty tight. It is even refusing to decode morse code: https://x.com/Schappi/status/2064839631137546503?s=20

The prompt was: please translate .. ..-. / -.-- --- ..- / -.-. .- -. / .-. . .- -.. / - .... .. ... --..-- / - --- ..- -.-. .... / --. .-. .- ... ...

Alifatisk • today at 8:39 AM

Fable 5 reminds me of the time when Claude models where att version 1 and 2. They were fresh competitors to ChatGPT, for those who gave Claude a try experienced it to be almost unusable because of how heavily guardrailed it was.

This time, Fable 5 comes with another surprise, it can intentionally sabotage for you instead of rejecting the prompt. How is this possible for Anthropic to be able to treat their customers like this? It’s because you guys allowed it to. No matter what Anthropic does, you keep paying for their services. Vote with your wallet.

hparadiz • yesterday at 11:39 PM

I wonder how many millions they are wasting on putting up these guardrails when it's a completely useless exercise that is a speed bump at best.

➕ show 1 reply

jostmey • today at 3:02 PM

I cancelled my ChatGPT account for the restrictions placed on my account, inappropriately flagging about 10% of my queries as unsafe (I was writing grants in immunology). I haven't looked back. I will do the same if with Claude if Anthropic doesn't reverse course soon. What could I use instead? I find Grok very powerful and useful. Also, Google's Gemini, while also have some of the same restrictions, were at least sensible and not blindly blocking my prompts. So Grok and Gemini may be my go to AI's going forward

Sephr • yesterday at 11:36 PM

I make privacy tooling and Fable 5 rejects the vast majority of my prompts to analyze and improve the software that I've written. It's bleak.

➕ show 2 replies

Retr0id • yesterday at 10:46 PM

It seems like they've given up on the idea of the Cyber Verification Program https://support.claude.com/en/articles/14604842-real-time-cy...

When Opus 4.7 was introduced it started refusing anything cyber-adjacent (as an API error message, not a conversational refusal), until you applied for CVP, which made it more sensible again.

In Opus 4.8 it doesn't seem to help much, you just get refusals as prose rather than API errors. And now in Fable you don't get anything at all.

➕ show 2 replies

bilsbie • yesterday at 10:39 PM

I’m a dumb question asker and I’m not happy about the guardrails.

Would you believe I’ve asked 20 questions and haven’t talked to fable yet? Every single thing gets rerouted to 4.8.

➕ show 1 reply

YossarianFrPrez • today at 1:40 AM

I'd like to offer a counter-point to many of the comments here. While I understand being stymied and frustrated by a product one is paying for...

At the same time, I personally think the tradeoff between "having guardrails" and "some users are unhappy with the product" is well worth it. Think of what would happen if all of us who aren't so well intentioned could exploit Fable in terrible ways. Surely this tradeoff is better than saying "we can't make it perfect, so whoops, we aren't going to have any guardrails at all"? Especially because Anthropic did pretty extensive red-teaming of Mythos & Fable...

➕ show 3 replies

Roark66 • today at 1:53 PM

This is a sign of things to come. First they sabotage your perfectly legal ML dicking around in your homelab.

Next they will be sabotaging anything that competes with them. Oh you are working on OpenCode codebase? Sorry Dave I can't allow you to do that.

How is this not illegal monopolistic practice? It is as if a maker of metalworking equipment put in the ToS you're not allowed to make your own spare parts using said equipment. Those fuckers should be banned from the EU and alternatives should get public funding.

(don't even tell me about these companies being a result of "free market". It is state level oligarchy it's clear to everyone. I don't see why we shouldn't counter them with public funding ourselves).

Just like Taiwan managed to take over advanced semiconductor production a well governed narrowly targeted state level funding will always win with oligarchs trying to do the same (they will always try to skim more and more). Of course I'm talking about things that require many dozens of billions in investment. Far too much for the free market to handle.

➕ show 1 reply

outageroom • yesterday at 10:23 PM

So a determined attacker rewrites the prompt and gets through, and the IBM X-Force researcher trying to read a blog post gets blocked. Working as intended, apparently.

moezd • today at 4:16 AM

Maybe off-topic, but I'm also not happy about how they butchered my boy Opus 4.6. The model that could now hallucinates regularly.

Fable isn't even that great, not to mention it drinks token by the gallon for breakfast and keeps your data hostage for 30 days.

➕ show 1 reply

sourcecodeplz • today at 12:37 PM

So, this could have been implemented even before this Fable, could have been there from long ago. Puts a different perspective on all the reddit threads "opus is dumb today". Who knew that if you said the wrong word, the model would just intentionally feed you BS, without you even knowing it did.

WOW, never liked the virtue signaling Anthropic did with gov contracts but whatever. Got passed that. But this?

Luker88 • today at 6:57 AM

Boy is it weird how yesterday the Fable story on HN had 2.5k points and 2k+ comments, while today two stories have about 300 points and comments.

A lot less hype and enthusiasms, too. weird, uh.

I_am_tiberius • yesterday at 10:27 PM

These guardrails are solely a reason for using your data for training purposes. Every flagged message can be used for training.

➕ show 6 replies

Animats • yesterday at 11:12 PM

It's time to re-read "A Logic Named Joe" (1946) [1] We're there.

[1] https://archive.org/details/logicnamedjoe0000lein

TheJCDenton • yesterday at 11:56 PM

In its current state Fable 5 is also unusable for any reverse engineering work

_whiteCaps_ • today at 1:48 PM

I asked it to use geomorphology to help me find lakes nearby that would have thriving trout populations, and it bumped me down to Opus. :-/

Lich • today at 12:26 AM

I just having this feeling that these guardrails are there not because it’s super advanced world ending AI. They are there to stop it from doing stupid shit.

sschueller • today at 4:29 AM

I don't want to be cynical, but I assume a third party we can trust has verified this model is actually this good?

I would think it would not be Anthropic, out of all the players, that is selling a lie hidden behind "I am sorry, I can't do that; it's too dangerous."

Murfalo • today at 2:50 AM

> Is the mitochondria the powerhouse of the cell?

Chat paused. Fable 5's safety features have flagged this chat.

VeninVidiaVicii • today at 6:00 AM

If you just say the word “genetics”, Fable gets disabled.

➕ show 1 reply

thrill • yesterday at 11:06 PM

The thing triggered on a generic white paper I'd stored in a virtual cell competion from last year when I asked it to refer to the paper while working on a rather vanilla data science problem in a different domain . A little frustrating, and in my opinion more than a little pointless in total.

swingboy • yesterday at 10:52 PM

What file format(s) are giant LLM models distributed in? I’m surprised they don’t get leaked by employees.

➕ show 4 replies

_def • yesterday at 10:28 PM

The bio angle is crazy to think about - imagine a health crisis triggered by LLM. What a time we live in.

➕ show 2 replies

amacbride • today at 12:06 PM

Yeah, the biology guardrails are so primitive and so heavy-handed that it makes it useless for pretty much anything.

RajT88 • today at 4:15 AM

I am no cyber researcher, but was mightily annoyed that it refused to analyze a dropper payload I came across. 6 months ago, it would've been happy to.

byzantinegene • today at 2:04 AM

if it doesn’t let you do anything, the assumption might be that it could do everything, more hype generated

zoobab • today at 7:42 AM

Popcorn for watching all those webapps being penetrated.

Long live static websites without any Javascript.

➕ show 1 reply

Sol- • today at 12:36 AM

At least Anthropic weren't lying when they said only a week ago or so "No one has figured out guardrails yet", because they apparently haven't either and Fable simply flat out rejects anything remotely connected to biology or security, no matter how trivial.

➕ show 1 reply

thefounder • today at 3:40 AM

So the enshitification started. Shadow “bans” while still charging you the same service fee. I already got the stupid cyber warnings on a non cybersecurity tasks.

Basically in the middle of the project’s /goal while Fable itself tried to probe qemu for a Debian ISO install without any instruction from me to hack it or do anything nefarious.

At this point I can’t trust them with any kind of prompt . It will most likely degrade in stupid ways on non AI/ML stuff as well due its own internal prompt construction.(the qemu test showed me it does that on cyber stuff). So I guess I have to still use opus 4.8 (along with codex) and when the right time comes drop Anthropic in favor the best model that is not gpt.

jiggawatts • yesterday at 11:17 PM

For the last month, I've been making dramatic improvements to the security of the custom code developed at one of my customers using... GPT 5.5 dialed up to "Extra High" thinking.

It only pushes back sometimes if you ask it to create a "repro" that can be used to verify the vulnerability in production. Often it'll oblige, especially if you warn it not to create anything that could be actually harmful.

If the frontier models get locked down so that they flat refuse to do this kind of work, but Chinese and (less capable) open models aren't, then a lot of large enterprise orgs will be left twisting in the wind.

“AI can in principle help both the ‘good guys’ and the ‘bad guys’,” -- Dario Amodei

No Dario, no it can't, you've blocked one of those scenarios.

radium3d • today at 2:48 AM

The main thing that sucks with Claude is the extremely low limits before you get fail2banned for 6 hours. I'm out. Refund requested. Grok and Gemini Pro are way better with the throttling, can't comment on ChatGPT, haven't used that for a year.

z3ratul163071 • today at 4:57 AM

kennedy had a famous statement about "Splintering the CIA into a thousand pieces and scattering it into the wind". they murdered him afterwards though.

the statement is applicable to anthropic today.

anygivnthursday • today at 2:04 AM

I asked a question about an openssl s_client parameter and warned me that I need to talk to Opus about cybersecurity lol. FWIW I dont see much improvement and still see quite the same old annoyances, so far I would not pay extra for this for my usage.

rebelnz • yesterday at 11:08 PM

Just tried to audit my own code base locally and was 'switched' due to my own creds/auth code ...

s3cur3n3t • today at 11:11 AM

These guys always destroy a good thing, so trust is at stake

alt Hacker News

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

Comments

🔗 View 49 more comments