> Qwen 3.5 397B-A17B is a good comparison
It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.
That's why the difference between open router prices and those official providers isn't that different. Plus who knows what open routed providers do in term quantization. They may be getting 100x better efficiency, thus the competitive price.
That being said not all users max out their plan, so it's not like each user costs anthropic 5,000 USD. The hemoragy would be so brutal they would be out of business in months
Agree, but I guess the Opus 4.6 is 10x larger, rather than Chinese models being 10x more efficient. It is said that GPT-4 is already a 1.6T model, and Llama 4 behemoth is also much bigger than Chinese open-weight models. Chinese tech companies are short of frontier GPUs, but they did a lot of innovations on inference efficiency (Deepseek CEO Liang himself shows up in the author list of the related published papers).
> Plus who knows what open routed providers do in term quantization
The quantisation is shown on the provider section.
>It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.
I find it a good comparison because it is a good baseline since we have zero insider knowledge of Anthropic. They give me an idea that a certain size of a model has a certain cost associated.
I don't buy the 10x efficiency thing: they are just lagging behind the performance of current SOTA models. They perform much worse than the current models while also costing much less - exactly what I would expect. Current Qwen models perform as good as Sonnet 3 I think. 2 years later when Chinese models catchup with enough distillation attacks, they would be as good as Sonnet 4.6 and still be profitable.
> That being said not all users max out their plan,
These are not cell phone plans which the average joe takes, they are plans purchased with the explicit goal of software development.
I would guess that 99 out of every 100 plans are purchased with the explicit goal of maxing them out.
That's a tautology. People think chinese models are 10x more efficient because they're 10x cheaper, and then you use that to claim that they're 10x more efficient.
Opus isn't that expensive to host. Look at Amazon Bedrock's t/s numbers for Opus 4.5 vs other chinese models. They're around the same order of magnitude- which means that Opus has roughly the same amount of active params as the chinese models.
Also, you can select BF16 or Q8 providers on openrouter.