Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

569 points • by speckx • yesterday at 4:42 PM • 499 comments • view on HN

https://www.theverge.com/ai-artificial-intelligence/947973/f...

Comments

Is the answer requiring licensing for certain use cases for AI? If you're asking questions that involve synthesising or modifying biologics, or anything that looks like cybersecurity research, you need to tie your real ID to the account?

➕ show 1 reply

Lammy • yesterday at 11:04 PM

I really hate the term “guardrails” for these limitations, since the purpose of a guardrail is to protect me, but these limitations exist to protect Anthropic.

simonmorley • today at 9:05 AM

I’m on their CSP and can’t even get it to update my website. It’s totally unusable rn.

s3cur3n3t • today at 11:11 AM

These guys always destroy a good thing, so trust is at stake

6thbit • today at 12:53 AM

Would it be a costly process for Anthropic to re-tune those guardrails? Like, re-training sort of cost? or like coding session sort of cost?

luxuryballs • yesterday at 11:57 PM

I can’t help but think that gimping itself for “security” is a marketing ruse and it’s not actually as “dangerous” as they want people to think it is.

matt-p • today at 12:31 PM

They are never happy :)

lwhi • today at 6:08 AM

If a product is genuinely dangerous to society, self regulation cannot be a suitable harness.

If only we had effective governments that could regulate industry.

If a nuclear weapon was developed today, would it be down to industry to self regulate?

➕ show 1 reply

aleksandrm • today at 12:16 AM

It refuses to do any legitimate work that it thinks can remotely be related with "cybersecurity", it won't even read my Docker app logs to try and troubleshoot a problem. Absolute garbage!

siva7 • yesterday at 11:04 PM

Fable is utterly useless with those guardrails for any serious it or life science work. Anthropic fucked me once a few months ago by closing down the subscription for any other harness, now it fucked me twice with buying again a subscription to find out their hyped model is unusable for normies. Using their products feels like a constant battle instead of a productive work day.. compare that with openai, not once did i feel like fighting against codex. Never again Anthropic..

➕ show 1 reply

Bassiestroep • today at 8:04 AM

I mean a lot of people were let into the CVP, I bet the group of people in there did a bunch of good fable 5 could do the exact same but better. Theres more good out there than bad.

jazz9k • yesterday at 4:52 PM

DeepSeek is the only one that I can directly ask about vulnerabilities and it will give me a PoC. Although not as good as others, it has helped me with security research.

The rest have guard rails that are so heavy, it makes them almost useless for cybersecurity.

➕ show 3 replies

casey2 • today at 5:45 PM

This is a pretty basic manipulation tactic. Be super shitty to your users and then roll back the abuse. The correct response is to not engage with shitty abusive dickheads.

Goofy_Coyote • today at 2:34 AM

It even refuses to read my resume, so... yeah

coolfox • today at 6:13 AM

funny how wired got the masses of the internet on board with hating AI, helping to spark the whole anti-movement and people still continue to rely on them for their understanding of AI and current events.

I feel like they report in a vaccum. take this anti exfil policy for claude, it was plainly explained as part of the launch of Anthropics new product. Security like this isn't novel, it isn't bad, you don't explain how your security works to the people you're securing against. Nobody freaks out about Steam's VAC ban system, no one is investigating gmail's spam filtering, Reddits vote fuzzing, cloudflares bot detection, or Vercel for blocking proxying services.

whats really the distinguishing principle? Is it really just not liking Anthropic's opinions? then just say that and use a different llm. chemist, biologists, and AI researchers cry a river lmao

neuroelectron • today at 10:36 AM

This is clearly advertising. But that's OK. OpenAI does the same thing.

dcl • today at 12:19 AM

Deliberately producing misaligned and deceitful AI systems now. Great.

andrewstuart • today at 1:56 AM

Stupid security theater. The only thing that makes sense would be zero restrictions.

andy_ppp • today at 4:32 AM

I said I wondered if the models were going to start poisoning distillation and I got downvoted to hell. It’s interesting to me that they are now downgrading ML research too in this model, I would argue this implies the terrifying and impossible to reason about self improving AI doom loop is coming sooner rather than later. Bit worrying.

ni5arga • today at 9:37 AM

Fable has been pretty disappointing for security research. It downgrades itself to Opus 4.8 even when you ask it questions about basic things like port scanning.

SXX • today at 2:17 AM

Software engineers shouldnt be happy either. If model silently sabotage cybersecurity research of others software there is abdolutely no way to be sure it wont be sabotaging cybersecurity of AI slop code it generated yesterday.

This is bad precedent and no one wants to pay X to generate code to then have to pay X*10 to figure out why your company just got hacked.

jongjong • yesterday at 10:53 PM

It's frustrating as someone who has worked hard to produce succinct, secure software that I can't use it to prove my software's correctness but big companies with insecure code can use it to fix their tangled mess.

I already tested all earlier models against all my open source projects and they are yet to find a vulnerability so I'm keen to try out Mythos.

I've been waiting to be vindicated for years and finally we have a tool which can do it with high confidence but I don't have access.

Also, my code is minimal and highly succinct so it would prove correctness with even more confidence since each library/module and integration fully fits in the context window.

Like the Protobuf.js fiasco is just pure vindication for me because I was being looked down upon for choosing JSON as the interchange format. Turns out their software was insecure all this time... With a literal remote code execution vulnerability!

rdiddly • yesterday at 11:18 PM

It's a marketplace. Someone else will outdo this inferior product.

➕ show 4 replies

ChrisArchitect • today at 6:06 AM

Related development:

Anthropic Walks Back Policy That Could Have 'Sabotaged' Researchers Using Claude

https://www.wired.com/story/anthropic-responds-to-backlash-o...

(https://news.ycombinator.com/item?id=48485958)

ChrisArchitect • today at 3:13 AM

More discussion:

If Claude Fable stops helping you, you'll never know

https://news.ycombinator.com/item?id=48467896

and Related:

Claude Fable 5

https://news.ycombinator.com/item?id=48463808

thefounder • today at 9:18 AM

This what that Anthropic CEO has been cooking all the time with his safety BS.

sscaryterry • today at 11:42 AM

It's is expensive, and its shit, period.

varispeed • today at 12:46 AM

Surely if they are sabotaging the output, they shouldn't charge the same fee for tokens as if the output was not sabotaged?

This is looking like something for regulator to look at and probably a class action lawsuit in the making.

I think people should be getting refunds. Including for shenanigans with Opus.

teaearlgraycold • today at 12:27 AM

I'm being careful with it, but I haven't had Fable reject requests to "harden" my code or "find issues" in auth-related modules, which you could use on someone else's code to find vulnerabilities.

notepad0x90 • yesterday at 11:30 PM

i think Anthropic is playing too fast-and-loose with the whole "no publicity is bad publicity" schtick.

m3kw9 • today at 1:11 AM

Could it now start to add unnoticeable security holes into your system if you start writing security type code.

yamakasi007 • today at 11:49 AM

[flagged]

gauravvij137 • today at 1:26 PM

[flagged]

jocelyner • today at 8:12 AM

[flagged]

yashvinder2739 • today at 10:29 AM

[flagged]

hanzeweiasa • today at 3:40 AM

[flagged]

dstephy19 • today at 6:24 AM

[flagged]

RedMagicBox • today at 11:09 AM

[dead]

Andy_Donner • today at 12:37 PM

[flagged]

RedMagicBox • today at 5:02 AM

[dead]

RedMagicBox • yesterday at 10:53 PM

[dead]

Keyframe • yesterday at 11:01 PM

[dead]

ekjhgkejhgk • today at 10:33 AM

[dead]

bschmidt400 • today at 12:47 AM

[dead]

felixgallo • yesterday at 10:49 PM

This is a clickbait article with a garbage title. From the actual article, the one quoted cybersecurity researcher is sane about it:

“But it is understandable as we are still in the early days and they are still adapting their guardrails. I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies,” said Suiche, who is a member of the technical staff at Tolmo, an AI cybersecurity startup. “It’s better to catch more people than not enough when you do such a release and to relax the guardrails over time.”

➕ show 1 reply

alt Hacker News

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

Comments

🔗 View 1 more comment