logoalt Hacker News

loneboatyesterday at 10:37 PM4 repliesview on HN

I've seen this claim a few times, but when I triggered the guardrails in Claude Code, it clearly notified me that it had switched to a different model ("something something for security purposes...").

Are you using Fable in Claude Code or in the browser?


Replies

vadanskyyesterday at 10:42 PM

It's from the model card:

> unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).

https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

(stolen from https://jonready.com/blog/posts/claude-fable5-is-allowed-to-...)

show 2 replies
mips_avataryesterday at 10:57 PM

They've said that they'll stop notifying developers when this gets triggered, instead they'll load in basically like a LORA that's designed to inject bugs into your code.

show 2 replies
ComputerGuruyesterday at 10:41 PM

Different restrictions. ML gets treated differently from the rest.

daedrdevyesterday at 10:43 PM

Specifically only ML research

show 1 reply