ANTI_DISTILLATION_CC This is Anthropic's anti-distillation defence b...

cedws • today at 10:28 AM • 6 replies • view on HN

    ANTI_DISTILLATION_CC
    
    This is Anthropic's anti-distillation defence baked into Claude Code. When enabled, it injects anti_distillation: ['fake_tools'] into every API request, which causes the server to silently slip decoy tool definitions into the model's system prompt. The goal: if someone is scraping Claude Code's API traffic to train a competing model, the poisoned training data makes that distillation attempt less useful.

Replies

nialse • today at 12:51 PM

Paranoia. And also ironic considering their base LLM is a distillation of the web and books etc etc.

➕ show 5 replies

jjcm • today at 6:37 PM

It looks like it worked, fwiw.

The qwen 27b model distilled on Opus 4.6 has some known issues with tool use specifically: https://x.com/KyleHessling1/status/2038695344339611783

Fascinating.

3form • today at 5:50 PM

I was thinking just yesterday that the research that Anthropic was sharing regarding how it's easy to poison training was unlikely to be conducted out of goodness of the heart.

GorbachevyChase • today at 5:42 PM

I like these guys less every day. The rate limits are so low they are close to not even useful as a provider.

➕ show 1 reply

mmaunder • today at 2:43 PM

Haven’t looked at the code, but is the server providing the client with a system prompt that it can use, which would contain fake tool definitions when this is enabled? What enables it? And why is the client still functional when it’s giving the server back a system prompt with fake tool definitions? Is the LLM trained to ignore those definitions?

Wonder if they’re also poisoning Sonnet or Opus directly generating simulated agentic conversations.

➕ show 1 reply

crazylogger • today at 2:18 PM

Why would this be in the client code though?

alt Hacker News

Replies