logoalt Hacker News

simianwordstoday at 6:45 AM1 replyview on HN

>It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

I find it a good comparison because it is a good baseline since we have zero insider knowledge of Anthropic. They give me an idea that a certain size of a model has a certain cost associated.

I don't buy the 10x efficiency thing: they are just lagging behind the performance of current SOTA models. They perform much worse than the current models while also costing much less - exactly what I would expect. Current Qwen models perform as good as Sonnet 3 I think. 2 years later when Chinese models catchup with enough distillation attacks, they would be as good as Sonnet 4.6 and still be profitable.


Replies

coldteatoday at 11:16 AM

> I don't buy the 10x efficiency thing: they are just lagging behind the performance of current SOTA models. They perform much worse than the current models while also costing much less - exactly what I would expect.

Define "much worse".

  +--------------------------------------+-------------+-----------+------------------+
  | Benchmark                            | Claude Opus | DeepSeek  | DeepSeek vs Opus |
  +--------------------------------------+-------------+-----------+------------------+
  | SWE-Bench Verified (coding)          | 80.9%       | 73.1%     | ~90%                 |
  | MMLU (knowledge)                     | ~91         | ~88.5     | ~97%               |
  | GPQA (hard science reasoning)        | ~79–80      | ~75–76    | ~95%             |
  | MATH-500 (math reasoning)            | ~78         | ~90       | ~115%            |
  +--------------------------------------+-------------+-----------+------------------+
show 2 replies