M2.7 was never open source, only open weight, which fulfills a lot of the spirit of open source, but isn't really the same thing as a whole. The noncommercial license is basically impossible to enforce if you're self-hosting anyway, because it's essentially impossible to prove that any individual commit was made by Minimax M2.7 in an environment where multiple self-hosted models are being run side-by-side. Besides that, you're not obligated to abide by terms you never agreed to in the first place, and you don't need to agree to anyone's terms to download open weights from a peer or over a torrent. These weights amount to public information that freely exists and is shared in the commons; not a scarce, rivalrous good; not copyrighted works; not sensitive intellectual property.
The weights may nominally be legally copyrighted, but the rightsholder certainly doesn't seem to be making anything resembling a serious effort to actually assert or defend those rights; on the contrary, they are doing the exact opposite by maximizing the gratis distribution, including knowingly and willingly via third parties, with no copy protection whatsoever, and no reasonable expectation of non-distribution.
They are not behaving like an entity trying to protect valuable intellectual property, they are behaving like an entity trying to reap the reputational and network effect benefits of maximizing the free distribution of a public good.
Less memory usage by the KV cache doesn't mean cheaper to serve overall. Once you've acquired hardware (for which you need more to serve DS4L than Minimax M2.7, the former being ~54B total params larger model to begin with, and which KV cache memory efficiency does nothing to address), the capex cost is basically fixed and opex just comes down to power draw, which will be marginally higher per token with DS4L than with M2.7 owed to the slower speeds that result from 13B active params vs 10B active params on forward passes during TG.
As to the 2nd part of your message, it's really easy to verify yourself (on openrouter).
DSv4-flash is currently being served at 0.14/0.24 $/MTok by most of the providers (8 as of writing this) and even a bit cheaper by 2 providers.
Minimax2.7 is being served at 0.30/1.20 $/MTok by most providers (4 providers as of writing this) and double that price by 2 providers.
As for the first part of your message, this is actually a good illustration of the miss-understanding of licensing LLMs. There are open-source models out there (Apache 2.0 and MIT) and there are also source-available (i.e. open weights) in llamas, minimax2.7 and something in between with the latest kimi (MIT w/ attribution). Open source in the context of LLMs means that you get a license to run, inspect, modify and re-release a model. It was never about data or training. But that's a very common interpretation, that's wrong IMO. But I get that it's contested, so anyway. Sorry for the tangent.
KV cache size is the main constraint on batching (for any given ctx length), that's a huge deal for efficiency both locally and in the data center. DeepSeek V4's reduced KV requirement is a real game changer, it definitively unlocks batching requests together for local inference, not just at scale.