But they don't show "strictly better" performance at cost per task!
The graphs show parts of the cost/performance pareto frontier occupied by Opus 4.8 and others occupied by Sonnet 5.0. If Opus 4.8 was strictly better at cost per task like you say, by definition the entire frontier would be occupied by Opus.
So neither is pareto-dominant over the other. In contrast, Sonnet 5.0 is Pareto-dominent over Sonnet 4.6 on those graphs.
But they don't show "strictly better" performance at cost per task!
The graphs show parts of the cost/performance pareto frontier occupied by Opus 4.8 and others occupied by Sonnet 5.0. If Opus 4.8 was strictly better at cost per task like you say, by definition the entire frontier would be occupied by Opus.
So neither is pareto-dominant over the other. In contrast, Sonnet 5.0 is Pareto-dominent over Sonnet 4.6 on those graphs.