It's extra interesting because I think the model people should be talking about is actually not DeepSeek V4 Pro, but the Flash version. When accounting for cache hits, the input price (per OpenRouter) is effectively only 6 cents per million tokens (3 vs 14 cents hit/miss), and 28 cents on output. That's really good efficiency, and it's not a sale price like they are doing with V4 Pro, it's the normal price.
It's actually pretty difficult to find a good comparison model because there isn't one. Again, a 14/28 cent in/out model, ignoring cache, it scores just below GPT 5.4 Mini-xhigh (75/450) and Gemini 3 Flash (50/300) in intelligence. It's similar to Gemma 4 31B in some metrics (13/38) including cost, so it's not completely unheard of, but it's pretty notable that virtually everything else in the same region in most benchmarks are going to cost at least 5 times more (much, much more in very output-heavy contexts)
It's well priced but does that have much relevance for "state of the art coding models", specifically?
I wouldn't use Gemini 3 Flash or GPT 5.4 mini for anything except the most trivial work, although both are useful for basic exploratory work.
So I'm using a heavy model for the bulk of the work and the cost of that so far outweighs the light model that the light model cost is effectively irrelevant.