logoalt Hacker News

cedwstoday at 10:28 AM6 repliesview on HN

    ANTI_DISTILLATION_CC
    
    This is Anthropic's anti-distillation defence baked into Claude Code. When enabled, it injects anti_distillation: ['fake_tools'] into every API request, which causes the server to silently slip decoy tool definitions into the model's system prompt. The goal: if someone is scraping Claude Code's API traffic to train a competing model, the poisoned training data makes that distillation attempt less useful.

Replies

nialsetoday at 12:51 PM

Paranoia. And also ironic considering their base LLM is a distillation of the web and books etc etc.

show 5 replies
jjcmtoday at 6:37 PM

It looks like it worked, fwiw.

The qwen 27b model distilled on Opus 4.6 has some known issues with tool use specifically: https://x.com/KyleHessling1/status/2038695344339611783

Fascinating.

3formtoday at 5:50 PM

I was thinking just yesterday that the research that Anthropic was sharing regarding how it's easy to poison training was unlikely to be conducted out of goodness of the heart.

GorbachevyChasetoday at 5:42 PM

I like these guys less every day. The rate limits are so low they are close to not even useful as a provider.

show 1 reply
mmaundertoday at 2:43 PM

Haven’t looked at the code, but is the server providing the client with a system prompt that it can use, which would contain fake tool definitions when this is enabled? What enables it? And why is the client still functional when it’s giving the server back a system prompt with fake tool definitions? Is the LLM trained to ignore those definitions?

Wonder if they’re also poisoning Sonnet or Opus directly generating simulated agentic conversations.

show 1 reply
crazyloggertoday at 2:18 PM

Why would this be in the client code though?