logoalt Hacker News

The gap between open weights LLMs and closed source LLMs

293 pointsby kkmyesterday at 9:14 PM221 commentsview on HN

Comments

profsummergigyesterday at 9:45 PM

IMHO, the biggest problem with the future of open weights models is that currently, open weights models are the result of philanthropy by some private org. (e.g. DeepSeek).

The spigot can be turned off at any time.

Until there's some sort of "community owned hardware", open weights models are always at risk of being discontinued.

show 20 replies
taffydavidtoday at 10:29 AM

> Now is probably a good time to liquidate your pension, fly to a remote island somewhere, and live out the remaining 6 months or so of civilization in peace.

> So maybe the open source apocalypse won’t happen yet.

Sorry I wasn't at the last doomer meeting, when did we decide good open source models are a harbinger for the apocalypse?

show 3 replies
christina97yesterday at 10:49 PM

The Chinese models will not overtake the frontier US ones given the current way things are going. The US models derive their lead from incredible efforts to source more and higher quality (mostly synthetic data) via great feats (eg generating with humongous teacher models that could never feasibly serve interactive traffic). The Chinese models advance via heroic efforts to optimize models and great feats to secure more and higher quality training data from the US frontier models.

For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.

show 11 replies
cedwstoday at 12:10 AM

I haven’t seen it discussed anywhere that closed models can essentially cheat benchmarks right? What Anthropic or OpenAI brand as a model doesn’t necessarily have to be just weights, it can be a whole backend system that augments the model itself. With this they can score better benchmarks than an open source model that is weights alone.

show 2 replies
swiftcodertoday at 12:35 PM

> What is notable is that a large amount of the total improvement of models has been in the coding benchmark. The coding index has gone from 15 months behind to only a month or two behind

This makes sense, right? Coding is one of the most obvious short-term uses of models, it also has a readymade market willing to pay a lot for tokens, it has a huge corpus to work with, and a significant degree of validation is built into the problem domain...

jacobgoldyesterday at 9:49 PM

It would be interesting to know how much of a boost the closed models companies are giving the open models.

If the closed models stop improving will the progress of open models slow?

show 2 replies
linzhangruntoday at 2:27 AM

USA, a country that known for the land of freedom, is now restricting frontier models to the point where non-Americans cannot even use them.

China, a "authoritarian state" country, "the antonym of freedom", with a software industry that is especially capitalist, has produced all the competitive open-weight models.

It really is IRONIC.

Disclosure: I am Chinese, and I understand this strategy comes from being behind, using open source as an asymmetric way to compete and make up for missing compute by sharing the burden, etc. But still, very ironically.

show 1 reply
gehstyyesterday at 10:04 PM

Interesting to consider this inline with recent us export bans, could the US be squandering its lead by giving the open source, largely Chinese labs catch up (in terms of model quality available to masses), will US labs be able to maintain the lead without users being able to use their latest models?

show 1 reply
mft_today at 9:06 AM

If the belief that open-weight/Chinese models depend significantly on distillation of the latest frontier models is correct, then presumably the gap will stabilise to the minimum time required for extraction of meaningful data (from the latest frontier model) plus finalisation of training of the latest dependent model. This gap can be minimised by increasing the process efficiency, but can't be eliminated entirely. (Attempts to hinder distillation from Anthropic/OpenAI may shift the balance too.)

zkmontoday at 10:34 AM

What matters might not be gap itself. For the bulk of AI users, it's the sufficiency of the capabilities of a model, is all that matters. If an open-weight model meets their requirements and far cheaper than closed weights model, then they have no reason not to go for the open-weight model.

tzsyesterday at 11:41 PM

I wonder if a lot of the companies and governments that seem to think it is essential to be on the forefront of applying leading edge LLMs to the point of starting to become dependent on them are going to find themselves in a situation like that from the Arthur C. Clarke short story "Superiority"? [1] [2].

[1] The story: https://nob.cs.ucdavis.edu/classes/ecs153-2019-04/readings/s...

[2] Wikipedia: https://en.wikipedia.org/wiki/Superiority_(short_story)

samatyesterday at 9:41 PM

Article confuses open source models with open weights models.

Not the same thing.

It’s used right in the articles body, but title is misleading.

show 2 replies
jessinra98today at 4:14 PM

Curious what other people's tipping point has been for picking one over the other

kuchtatoday at 10:47 AM

Are there really some Open Source models there? Open Weights yes, but Open Source requires to open source of all the training data too, otherwise you can't reproduce the weights in the same manner as you would reproduce binaries from Open Source code.

_pdp_yesterday at 10:46 PM

Frankly it does not matter if there is gap because for most practical use-cases the end user can barely perceive the difference in intelligence.

On paper frontier models will be ahead of the curve but I don't think hardly anyone will be able to tell if a piece of work, say a landing page, is created with Fable or GLM and that is the point. The perceptible intelligence will reach a point beyond which it is no longer considered, except for some narrow use-case.

show 1 reply
JumpCrisscrossyesterday at 10:25 PM

Now let’s look at the economics of buying versus renting. I’ve seen a lot of attention given to hardware capital costs. But a comment the other day got me thinking about power costs, too—at what performance differential do these factors intersect to make on-prem economically competitive with datacenters for businesses?

dabinatyesterday at 10:46 PM

I believe the open model party will eventually end. Perhaps because companies realize it’s too much of a commercial advantage, countries don’t want to give other countries commercial or military help, or maybe even an outright ban after someone uses an open model to guide them through how to make a bomb.

show 2 replies
jackconsidineyesterday at 9:44 PM

Achilles and the tortoise [0] is usually a fallacy. If the tortoise has a head start, then Achilles will never catch it because in the time it takes Achilles to reach the tortoise's location the tortoise has moved some degree further, ad infinitum. Obviously not real because Achilles will pass the tortoise -- I think a fallacy because the framing creates a fake asymptote (they will both pass the point where they're approaching a tie).

In this case it may actually apply though, no? Open models get better from closed model distillation?

[0] https://en.wikipedia.org/wiki/Zeno%27s_paradoxes

doctobogganyesterday at 11:24 PM

If the Chinese government is as involved in LLM development strategy as many people claim, wouldn't you expect them to immediately cease releasing open weight models and restrict access as soon as they start producing the frontier models? I am assuming this is what the USG thinks and is why they are trying to cut off the flow to foreign nationals ASAP.

LLMs are an undeniably valuable tool, and governments like to control those.

show 3 replies
zb3today at 3:29 AM

I just hope CCP doesn't follow the US government and won't pull the plug before their companies release something on-par with the US frontier models. The question is whether US models not available to the general public will count.

The question is not whether they'll prohibit open-weight models better than the US ones, because we all know the obvious answer.

justindotdevyesterday at 10:04 PM

at first glance, these graphs are confusing

show 3 replies
maxiniolyesterday at 11:41 PM

Am I the only one flagging inconsistencies in the different evaluations on the 18 benchmarks ? Why is sometimes the closed frontier model grok ? And then opus 4.8 ? Compared to GLM 5.2 once or sometimes Kimi 2.6 ?

ChrisArchitecttoday at 12:56 AM

Related:

The unbearable cheapness of open weight models

https://news.ycombinator.com/item?id=48668255

casey2today at 1:43 AM

This is just and example of "lying with statistics". Going by compute efficiency the gap has already closed (both in training and inference coincidentally).

StreamCtxtoday at 5:26 AM

[flagged]

llmslaveyesterday at 10:11 PM

The gap is huge and im tired of reading these articles constantly

show 1 reply
sinuhe69today at 9:07 AM

At this point, I think open weights vs proprietary models is a misnomer.

First, we can not be sure the next release will remain open weights as Qwen 3.7 has showed.

And second, they are all Chinese models. So instead of open weights, perhaps Chinese AI models is a better word choice.