Just reading the headline, I say good.
A) These models are trained by ignoring IP. It is hypocritical and absurd to then try to assert IP over them. And I am for the destruction of IP on all ends.
B) What this essentially means is that the Chinese labs are taking the work of these mega corporations into making it freely accessible to other labs and businesses, to serve inference, fine tune, and host privately on prem. That's clearly a good thing for competition in the market as a whole.
C) I don't see why we should have to duplicate the massive energy and infrastructure investment of building foundation models over and over forever just because we want to preserve the IP rights of a few companies. That seems a shame and it seems better to me for everything to learn from everything else for the whole ecosystem to get better by topping each other and building off each other; that's also why publishing research into the architecture and training of these models is so much better than what the proprietary labs do (keeping everything a secret), although tbf Anthropic's interpretability research is cool.
D) these Chinese models give 90% of the performance of frontier proprietary models at a 10th or 20th of the cost. That seems like a win for everyone. Not to mention the fact that this distilling also allows them to make much smaller local models that everyone can run. This is a win for actual democratization, decentralization, and accessibility for the little guy.
They're trying to kidnap what Anthropic has rightfully stolen!
Jokes and complete lack of sympathy aside, it does complicate the narrative that these small labs are always on the heels of the big labs for pennies on the dollar, if they rely on distilling the big labs models. That means there still has to be big bucks coming from somewhere.
It's greed, now that they have all the data and infrastructure they are pulling up the ladder.
Why do you think not a single one of these labs have released an open source models distilled on their own SOTA model?
They are all preaching they want to provide AI to everyone, wouldn't this be the best way to do this? Use your SOTA model to produce a lesser but open source model?
A) The "IP" they're concerned about isn't the same IP you speak of. It's the investment in RL training / GPU hours that it takes to go from a base model to a usable frontier model.
B) I don't think the story is so clean. The distilled models often have regressions in important areas like safety and security (see, for example, NIST's evaluation of DeepSeek models). This might be why we don't see larger companies releasing their own tiny reasoning models so much. And copying isn't exactly healthy competition. Of course, I do find it useful as a researcher to experiment with small reasoning models -- but I do worry that the findings don't generalize well beyond that setting.
C) Maybe because we want lots of different perspectives on building models, lots of independent innovation. I think it's bad if every model is downstream of a couple "frontier" models. It's an issue of monoculture, like in cybersecurity more generally.
D) Is it really 90% of the performance, or are they just extremely targeted to benchmarks? I'd be cautious about running said local models for, e.g., my agent with access to the open web.
> And I am for the destruction of IP on all ends.
While I'm not unsympathetic to the plight of creatives, and their need to eat, I feel like the pendulum has swung so far to the interests of the copyright holders and away from the needs of the public that the bargain is no longer one I support. To the extent that AI is helping to expose the absurdity of this system, I'm all for it.
I don't think "burn it all down" is the answer, but I'd love to see the pendulum swing back our way.