logoalt Hacker News

NitpickLawyeryesterday at 5:36 PM1 replyview on HN

M2.7 is no longer open source, it's been changed to a NC license. It's an OK model, but IME out of the big 5 chinese models (ds, glm, kimi, minimax and qwen), DS models have generally shown better generalisation and real-world usage than all the others, even if the benchmark scores were lower. Less benchmaxxxing, basically.

DS4 also has some neat new arch improvements, giving it a lot of context at lower VRAM usage. So it will be cheaper to serve, B for B than previous models.


Replies

anonym29yesterday at 5:56 PM

M2.7 was never open source, only open weight, which fulfills a lot of the spirit of open source, but isn't really the same thing as a whole. The noncommercial license is basically impossible to enforce if you're self-hosting anyway, because it's essentially impossible to prove that any individual commit was made by Minimax M2.7 in an environment where multiple self-hosted models are being run side-by-side. Besides that, you're not obligated to abide by terms you never agreed to in the first place, and you don't need to agree to anyone's terms to download open weights from a peer or over a torrent. These weights amount to public information that freely exists and is shared in the commons; not a scarce, rivalrous good; not copyrighted works; not sensitive intellectual property.

The weights may nominally be legally copyrighted, but the rightsholder certainly doesn't seem to be making anything resembling a serious effort to actually assert or defend those rights; on the contrary, they are doing the exact opposite by maximizing the gratis distribution, including knowingly and willingly via third parties, with no copy protection whatsoever, and no reasonable expectation of non-distribution.

They are not behaving like an entity trying to protect valuable intellectual property, they are behaving like an entity trying to reap the reputational and network effect benefits of maximizing the free distribution of a public good.

Less memory usage by the KV cache doesn't mean cheaper to serve overall. Once you've acquired hardware (for which you need more to serve DS4L than Minimax M2.7, the former being ~54B total params larger model to begin with, and which KV cache memory efficiency does nothing to address), the capex cost is basically fixed and opex just comes down to power draw, which will be marginally higher per token with DS4L than with M2.7 owed to the slower speeds that result from 13B active params vs 10B active params on forward passes during TG.

show 2 replies