logoalt Hacker News

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

569 pointsby speckxyesterday at 4:42 PM499 commentsview on HN

https://www.theverge.com/ai-artificial-intelligence/947973/f...


Comments

JumpCrisscrosstoday at 12:27 AM

Is the answer requiring licensing for certain use cases for AI? If you're asking questions that involve synthesising or modifying biologics, or anything that looks like cybersecurity research, you need to tie your real ID to the account?

show 1 reply
Lammyyesterday at 11:04 PM

I really hate the term “guardrails” for these limitations, since the purpose of a guardrail is to protect me, but these limitations exist to protect Anthropic.

simonmorleytoday at 9:05 AM

I’m on their CSP and can’t even get it to update my website. It’s totally unusable rn.

s3cur3n3ttoday at 11:11 AM

These guys always destroy a good thing, so trust is at stake

6thbittoday at 12:53 AM

Would it be a costly process for Anthropic to re-tune those guardrails? Like, re-training sort of cost? or like coding session sort of cost?

luxuryballsyesterday at 11:57 PM

I can’t help but think that gimping itself for “security” is a marketing ruse and it’s not actually as “dangerous” as they want people to think it is.

matt-ptoday at 12:31 PM

They are never happy :)

lwhitoday at 6:08 AM

If a product is genuinely dangerous to society, self regulation cannot be a suitable harness.

If only we had effective governments that could regulate industry.

If a nuclear weapon was developed today, would it be down to industry to self regulate?

show 1 reply
aleksandrmtoday at 12:16 AM

It refuses to do any legitimate work that it thinks can remotely be related with "cybersecurity", it won't even read my Docker app logs to try and troubleshoot a problem. Absolute garbage!

siva7yesterday at 11:04 PM

Fable is utterly useless with those guardrails for any serious it or life science work. Anthropic fucked me once a few months ago by closing down the subscription for any other harness, now it fucked me twice with buying again a subscription to find out their hyped model is unusable for normies. Using their products feels like a constant battle instead of a productive work day.. compare that with openai, not once did i feel like fighting against codex. Never again Anthropic..

show 1 reply
Bassiestroeptoday at 8:04 AM

I mean a lot of people were let into the CVP, I bet the group of people in there did a bunch of good fable 5 could do the exact same but better. Theres more good out there than bad.

jazz9kyesterday at 4:52 PM

DeepSeek is the only one that I can directly ask about vulnerabilities and it will give me a PoC. Although not as good as others, it has helped me with security research.

The rest have guard rails that are so heavy, it makes them almost useless for cybersecurity.

show 3 replies
casey2today at 5:45 PM

This is a pretty basic manipulation tactic. Be super shitty to your users and then roll back the abuse. The correct response is to not engage with shitty abusive dickheads.

Goofy_Coyotetoday at 2:34 AM

It even refuses to read my resume, so... yeah

coolfoxtoday at 6:13 AM

funny how wired got the masses of the internet on board with hating AI, helping to spark the whole anti-movement and people still continue to rely on them for their understanding of AI and current events.

I feel like they report in a vaccum. take this anti exfil policy for claude, it was plainly explained as part of the launch of Anthropics new product. Security like this isn't novel, it isn't bad, you don't explain how your security works to the people you're securing against. Nobody freaks out about Steam's VAC ban system, no one is investigating gmail's spam filtering, Reddits vote fuzzing, cloudflares bot detection, or Vercel for blocking proxying services.

whats really the distinguishing principle? Is it really just not liking Anthropic's opinions? then just say that and use a different llm. chemist, biologists, and AI researchers cry a river lmao

neuroelectrontoday at 10:36 AM

This is clearly advertising. But that's OK. OpenAI does the same thing.

dcltoday at 12:19 AM

Deliberately producing misaligned and deceitful AI systems now. Great.

andrewstuarttoday at 1:56 AM

Stupid security theater. The only thing that makes sense would be zero restrictions.

andy_ppptoday at 4:32 AM

I said I wondered if the models were going to start poisoning distillation and I got downvoted to hell. It’s interesting to me that they are now downgrading ML research too in this model, I would argue this implies the terrifying and impossible to reason about self improving AI doom loop is coming sooner rather than later. Bit worrying.

ni5argatoday at 9:37 AM

Fable has been pretty disappointing for security research. It downgrades itself to Opus 4.8 even when you ask it questions about basic things like port scanning.

SXXtoday at 2:17 AM

Software engineers shouldnt be happy either. If model silently sabotage cybersecurity research of others software there is abdolutely no way to be sure it wont be sabotaging cybersecurity of AI slop code it generated yesterday.

This is bad precedent and no one wants to pay X to generate code to then have to pay X*10 to figure out why your company just got hacked.

jongjongyesterday at 10:53 PM

It's frustrating as someone who has worked hard to produce succinct, secure software that I can't use it to prove my software's correctness but big companies with insecure code can use it to fix their tangled mess.

I already tested all earlier models against all my open source projects and they are yet to find a vulnerability so I'm keen to try out Mythos.

I've been waiting to be vindicated for years and finally we have a tool which can do it with high confidence but I don't have access.

Also, my code is minimal and highly succinct so it would prove correctness with even more confidence since each library/module and integration fully fits in the context window.

Like the Protobuf.js fiasco is just pure vindication for me because I was being looked down upon for choosing JSON as the interchange format. Turns out their software was insecure all this time... With a literal remote code execution vulnerability!

rdiddlyyesterday at 11:18 PM

It's a marketplace. Someone else will outdo this inferior product.

show 4 replies
ChrisArchitecttoday at 6:06 AM

Related development:

Anthropic Walks Back Policy That Could Have 'Sabotaged' Researchers Using Claude

https://www.wired.com/story/anthropic-responds-to-backlash-o...

(https://news.ycombinator.com/item?id=48485958)

ChrisArchitecttoday at 3:13 AM

More discussion:

If Claude Fable stops helping you, you'll never know

https://news.ycombinator.com/item?id=48467896

and Related:

Claude Fable 5

https://news.ycombinator.com/item?id=48463808

thefoundertoday at 9:18 AM

This what that Anthropic CEO has been cooking all the time with his safety BS.

sscaryterrytoday at 11:42 AM

It's is expensive, and its shit, period.

varispeedtoday at 12:46 AM

Surely if they are sabotaging the output, they shouldn't charge the same fee for tokens as if the output was not sabotaged?

This is looking like something for regulator to look at and probably a class action lawsuit in the making.

I think people should be getting refunds. Including for shenanigans with Opus.

teaearlgraycoldtoday at 12:27 AM

I'm being careful with it, but I haven't had Fable reject requests to "harden" my code or "find issues" in auth-related modules, which you could use on someone else's code to find vulnerabilities.

notepad0x90yesterday at 11:30 PM

i think Anthropic is playing too fast-and-loose with the whole "no publicity is bad publicity" schtick.

m3kw9today at 1:11 AM

Could it now start to add unnoticeable security holes into your system if you start writing security type code.

yamakasi007today at 11:49 AM

[flagged]

gauravvij137today at 1:26 PM

[flagged]

jocelynertoday at 8:12 AM

[flagged]

yashvinder2739today at 10:29 AM

[flagged]

hanzeweiasatoday at 3:40 AM

[flagged]

dstephy19today at 6:24 AM

[flagged]

RedMagicBoxtoday at 11:09 AM

[dead]

Andy_Donnertoday at 12:37 PM

[flagged]

RedMagicBoxtoday at 5:02 AM

[dead]

RedMagicBoxyesterday at 10:53 PM

[dead]

Keyframeyesterday at 11:01 PM

[dead]

ekjhgkejhgktoday at 10:33 AM

[dead]

bschmidt400today at 12:47 AM

[dead]

felixgalloyesterday at 10:49 PM

This is a clickbait article with a garbage title. From the actual article, the one quoted cybersecurity researcher is sane about it:

“But it is understandable as we are still in the early days and they are still adapting their guardrails. I am sure they are going to evolve over time as Anthropic and other frontier model companies will collaborate more with the current new generation of cybersecurity companies,” said Suiche, who is a member of the technical staff at Tolmo, an AI cybersecurity startup. “It’s better to catch more people than not enough when you do such a release and to relax the guardrails over time.”

show 1 reply

🔗 View 1 more comment