A) The "IP" they're concerned about isn't the same IP you speak of. It's th...

ashertrockman • yesterday at 7:48 PM • 2 replies • view on HN

A) The "IP" they're concerned about isn't the same IP you speak of. It's the investment in RL training / GPU hours that it takes to go from a base model to a usable frontier model.

B) I don't think the story is so clean. The distilled models often have regressions in important areas like safety and security (see, for example, NIST's evaluation of DeepSeek models). This might be why we don't see larger companies releasing their own tiny reasoning models so much. And copying isn't exactly healthy competition. Of course, I do find it useful as a researcher to experiment with small reasoning models -- but I do worry that the findings don't generalize well beyond that setting.

C) Maybe because we want lots of different perspectives on building models, lots of independent innovation. I think it's bad if every model is downstream of a couple "frontier" models. It's an issue of monoculture, like in cybersecurity more generally.

D) Is it really 90% of the performance, or are they just extremely targeted to benchmarks? I'd be cautious about running said local models for, e.g., my agent with access to the open web.

Replies

_aavaa_ • yesterday at 10:29 PM

> Maybe because we want lots of different perspectives on building models, lots of independent innovation.

That’s only really possible if the front runner don’t buy up all of the chips on the market.

logicprog • yesterday at 8:14 PM

Fair points, and worth responding to for a more nuanced discussion! I hope you take these responses in that light :)

A) Well, sure, yes, it's different specific IP being distilled on versus what was trained on. But I don't see why the same principles should not apply to both. If companies ignore IP when training on material, then it should be okay for other companies to ignore IP when distilling on material — either IP is a thing we care about or it isn't. (I don't).

B) I'm really not sure how seriously I take the worries about safety and security RLing models. You can RLA amodel to refuse to hack something or make a bio weapon or whatever as much as you want, but ultimately, for one thing, the model won't be capable of helping a person who has no idea what they're doing. Do serious harm anyway. And for another thing, the internet already exists for finding information on that stuff. And finally, people are always going to build the jailbreak models anyway. I guess the only safety related concern I have with models is sychophancy, and from what I've seen, there's no clear trend where closed frontier models are less sychophantic than open source ones. In fact, quite the opposite, at least in the sense that the Kimi models are significantly less psychophantic than everyone else.

C) This is a pretty fair point. I definitely think that having more base frontier models in the world, trained separately based on independent innovations, would be a good thing. I'm definitely in favor of having more perspectives.

But it seems to me that there is not really much chance for diversity in perspectives when it comes to training a base frontier model anyway because they're all already using the maximum amount of information available. So that set is going to be basically identical.

And as for distilling the RL behaviors and so on of the models, this distillation process is still just a part of what the Chinese labs do — they've also all got their own extensive pre-training and RL systems, and especially RL with different focuses and model personalities, and so on.

They've also got diverse architectures and I suspect, in fact, very different architectures from what's going on under the hood from the big frontier labs, considering, for instance, we're seeing DSA and other hybrid attention systems make their way into the Chinese model mainstream and their stuff like high variation in size, and sparsity, and so on.

D) I find that for basically all the tasks that I perform, the open models, especially since K2T and now K2.5, are more than sufficient, and I'd say the kind of agentic coding, research, and writing review I do is both very broad and pretty representative. So I'd say that for 90% of tasks that you would use an AI for, the difference between the large frontier models and the best open weight models is indistinguishable just because they've saturated them, and so they're 90% equivalent even if they're not within 10% in terms of the capabilities on the very hardest tasks.

➕ show 1 reply

alt Hacker News

Replies