I've seen this claim a few times, but when I triggered the guardrails in Claude Code, it clearl...

loneboat • yesterday at 10:37 PM • 4 replies • view on HN

I've seen this claim a few times, but when I triggered the guardrails in Claude Code, it clearly notified me that it had switched to a different model ("something something for security purposes...").

Are you using Fable in Claude Code or in the browser?

Replies

vadansky • yesterday at 10:42 PM

It's from the model card:

> unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).

https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

(stolen from https://jonready.com/blog/posts/claude-fable5-is-allowed-to-...)

➕ show 2 replies

mips_avatar • yesterday at 10:57 PM

They've said that they'll stop notifying developers when this gets triggered, instead they'll load in basically like a LORA that's designed to inject bugs into your code.

➕ show 2 replies

ComputerGuru • yesterday at 10:41 PM

Different restrictions. ML gets treated differently from the rest.

daedrdev • yesterday at 10:43 PM

Specifically only ML research

➕ show 1 reply

alt Hacker News

Replies