While I agree directionally, I'll caveat that "cost per token" != "cost per task". In the case of Qwen3.6 it tends to think 1.6x more than Haiku, so the cost of Haiku on the same tasks tends to only be about double. More detail from comparing their Artificial Analysis metrics:
Qwen3.6-35B-A3B vs Claude Haiku 4.5
reasoning mode · AA Intelligence Index v4.0
46.0 ┤ ↖ better — cheaper · smarter · faster
│
│
44.0 ┤ ╭─────╮
│ │ ● │ Qwen3.6-35B-A3B
│ ╰─────╯
42.0 ┤
│
│
40.0 ┤
│
│
38.0 ┤ ╭───╮
│ Claude Haiku 4.5 │ ○ │
│ ╰───╯
36.0 ┤
└┬─────────┬─────────┬─────────┬─────────┬────────┬
$200 $300 $400 $500 $600 $700
x → cost to run the index (USD) lower is better
y → AA intelligence index higher is better
bubble area = output speed (tokens / sec)
╭─────╮ ╭───╮
│ ● │ Qwen ~196 t/s │ ○ │ Haiku ~93 t/s
╰─────╯ ╰───╯
┌─────────────────────┬──────────┬──────────┬───────────┐
│ model │ AA index │ run cost │ out speed │
├─────────────────────┼──────────┼──────────┼───────────┤
│ Qwen3.6-35B-A3B ●│ 43.5 │ $280 │ 196 t/s │
│ Claude Haiku 4.5 ○│ 37.1 │ $620 │ 93 t/s │
└─────────────────────┴──────────┴──────────┴───────────┘
COST PER TOKEN ≠ COST PER TASK
output tokens per index run:
Haiku 4.5 87.3M (79.3M reasoning + 8.0M answer)
Qwen3.6 143.2M (131.7M reasoning + 11.5M answer)
→ Qwen emits 1.64× more output
── output speed (tokens / sec) ────────── raw rate · higher = faster
Qwen3.6 100% ~196 t/s
Haiku 4.5 ~47% ~93 t/s
→ Qwen ~2.1× faster per token
╎ 1.64× more tokens < 2.1× faster rate
▼
── solution speed (per finished answer) ── higher = faster
Qwen3.6 100%
Haiku 4.5 ~78%
→ Qwen ~1.3× FASTER to a solution
SCORECARD
intelligence cost / task speed to solution
Qwen3.6-35B-A3B 43.5 $280 ~1.3× faster
Claude Haiku 4.5 37.1 $620 (slower)
→ Qwen wins all three. The reasoning blow-up (1.64×) is smaller than
the raw-speed edge (2.1×), so Qwen stays ahead per task.
How did you get that nicely formatted graph and table in your post ?!