There's two basic kinds of distillation: 1) the massive [and dumb] method where you ask a quest...

0xbadcafebee • today at 12:54 AM • 19 replies • view on HN

There's two basic kinds of distillation: 1) the massive [and dumb] method where you ask a question and use the answer as reinforcement (Black Box), and 2) more targeted distillation where you use one model to directly inform/train/guide another model (RLAIF).

The latter is basically fine-tuning the model with direction from another model. Thousands of businesses do this every day to fine-tune. This is almost certainly what the Chinese labs are doing, since it has a much better effect on the end result than just getting simple answers to simple questions.

These complaints of distillation are inflating the problem to make it sound worse than it is, because they want the USG to block/ban Chinese model providers as protectionism. They have already called for more export controls on chips (which is funny because DeepSeek v4 was designed to run on Huawei chips and now the other Chinese providers are following suit). But they can't come right out and say that, so their claim is that they're asking for more export controls because distilled models might not be as safe as their own. But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.

Replies

anon373839 • today at 5:42 AM

> These complaints of distillation are inflating the problem to make it sound worse than it is

Unfortunately, the Reuters piece itself is complicit in this dramatization. The lede paragraph parrots Anthropic's talking point that distillation is an "attack", without using quotes that would alert the reader that this framing is a corporate talking point. Distillation is NOT an attack.

➕ show 1 reply

ALLTaken • today at 12:40 PM

They want to create a monopoly and destroy every competitor, before they got a chance to rival them.

Why can't OSS software rival closed source software? It should be an open market, at least "somewhat", what's happening for real? EU providers will also get banned, if they reach or exceed US model capabilties?

Closed source providers can close your account at a whim like and destroy your business and then use the data you supplied them to create a competitor (Meta, Google, OpenAI, Anthrophic).

➕ show 2 replies

gmerc • today at 4:57 AM

https://research.nvidia.com/labs/lpr/slm-agents/ - Distillation data is a natural byproduct of using these models. There's no effective defence against it. Anthropic is degrading thinking blocks to summaries to slow it down and hide model internals, but in the end, the math says you're SOL and it works on MNC/Large Corporate scale well enough that the moment cost becomes a priority, you're left without the lock in you need to keep customers paying.

➕ show 1 reply

giancarlostoro • today at 12:39 PM

Heck, one of my favorite fine tuned copies of Qwen uses Opus 4.6 Reasoning distilled. I'm not sure where the maintainer is based out of, but me in the states, if I had the hardware to do similar things I would. Its like you say, basically everyone is doing it. It kind of makes sense to me too given that you can have roughly similar data, but your reasoning logic is what the real secret sauce is in my eyes. It doesn't matter if you know everything in the world, if you don't know how to reason with that information.

https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-...

summarybot • today at 1:50 PM

It's about training data and using Claude to compare 2 outputs and have it indicate the better one. This gives you higher quality training data that you can use to train a fresh set of weights. Weights don't get adjusted on-the-fly, instead the dataset for training is improved and then you train a'fresh. And it's hard to detect because you're just asking the model which of these outputs for a given prompt is better? Or something along those lines.

cm2187 • today at 6:57 AM

Stupid question: I was under the impression that these models were trained on PB of data. Surely the amount of questions/response they can extract from querying a bigger model (Claude) is fairly modest. How is it not a drop vs the training dataset?

➕ show 5 replies

handoflixue • today at 6:20 AM

> But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.

They claim two things:

1) The specific, available jailbreak for Fable 5 is not dangerous - this has been confirmed by multiple experts, and there is no credible evidence against this claim (in other words, Anthropic is probably correct)

2) It is impossible to build an LLM that is immune to all jailbreaks. Again, there is no credible evidence against this claim, i.e. Anthropic is again entirely correct.

If #1 was false, they could just publish the details of the jailbreak - it supposedly only works on Fable 5, so there's no possible danger.

If #2 was false, surely some other LLM lab would have done it by now. Especially since a number of governments have made it clear there is a market for such a project.

➕ show 3 replies

dannyw • today at 3:31 AM

If you’re doing evals, you’re basically doing RLAIF without training a model; just looking at the results.

Fundamentally it is very difficult to stop this while still making your AI models useful.

➕ show 1 reply

sorenjan • today at 11:34 AM

Doesn't "real" distillation use the logits instead of the final tokens? I would classify this more like using a model to generate synthetic training data.

➕ show 1 reply

SubiculumCode • today at 7:47 AM

The compute deficit of Chinese Ai companies is real, and it IS THE ONLY competitive advantage that Western companies have.

The only way the U.S. keeps that edge is to prevent distillation. The only way Chinese companies can make up for the deficit in compute is to distill. There innovation in great supply on every side of the Ocean. Its about the chips. And in terms of national security, for the U.S., and for China, its about the chips and the distillation that undermines that advantage. This is an arms race.

➕ show 5 replies

janalsncm • today at 4:37 AM

Yeah I think the technical term is something more like “pseudo-labeling”. The OG distillation requires logits which Anthropic doesn’t provide.

lemax • today at 5:48 AM

I've used RLAIF to build out heuristic based non-LLM models for various decision systems and achieved like, 95% F1 on certain projects. We're in a place where models can be used to fine tune a lot of stuff via loops.

friendzis • today at 7:06 AM

> These complaints of distillation are inflating the problem to make it sound worse than it is

This is, in part, a problem every judicial and legislative system has faced since forever: form versus function.

Take a classic elicitation spying techniques: a foreign spy meets a military officer/scientist at a bar, strikes up a conversation, makes an observation wondering how could a missile hit some target at some accuracy and elicits a response that with laser guidance it is entirely possible. From this they get info that there is some technology to laser guide missiles. Or in retail, a competitor hiring a secret buyer for core baskets of goods and analyzing prices in the receipts.

The function is espionage, the form is conversation and all info is in a sense provided willingly. Where do you pull the slider?

These distillation "attacks" are not only indistinguishable from evals, they ARE evals. The function is own model training, the form is eval. Normally, one would expect to have risk benefit analysis based discussion which direction to push the legality slider to. The problem with these recurring statements is that they invoke enshitification of legislature.

crazylogger • today at 10:12 AM

Chinese labs access Claude via API. Isn't it the black box method by definition?

killerstorm • today at 12:37 PM

I'm sorry, but you got the terminology exactly backwards. Training on the answer is called supervised fine-tuning.

Just for the sake of clarity:

0. Full distillation uses logits of the teacher model - that's much more information than the text itself. This is a kind of distillation used inside labs, but one can't distill Claude this way as logits are not available via API.

1. Supervised fine-tuning on synthetic data might be called blackbox distillation. I guess that's what you meant in your case (1).

2. Reinforcement learning (like RLAIF) uses least amount of information from the teacher, i.e. only few bits per task.

mannanj • today at 4:43 AM

>But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.

Yes this is in line with what Anthropic said in their public statements about their Fable access restriction by the government directive. The hypocrisy and inconsistency in their statements and behavior feels quite childish and controlling. I believe our companies and their leaders, friends among our other influential leaders and leaders from rich social classes, want to actively hurt most people as this behavior looks to be quite self-interested.

➕ show 1 reply

fnord77 • today at 5:12 AM

Can you reach into the model and "transplant" weights directly?

➕ show 4 replies

JumpCrisscross • today at 6:33 AM

> These complaints of distillation are inflating the problem

They’re also missing the point. What would have happened to a member of the Manhattan Project who, through personal pursuit of profit, neglected their duty enough to let the bomb leak?

➕ show 1 reply

catigula • today at 1:13 PM

Chinese companies are engaging in anti-competitive practices, as usual. They are rogue actors on the economic scene. If it were feasible, they'd be widely banned, and for good reason.

➕ show 1 reply

alt Hacker News

Replies