Maybe a good idea to be more explicit about this -- maybe a cost analysis benchmark would be a nice accompaniment.
This kind of thing keeps popping up each time a new model is released and I don't think people are aware that token efficiency can change.
Their subscription subscribers will see/feel the difference irregardless, API pricing is hopefully read by devs that know about token efficiency and effort.
Agreed. Would be great if everyone starts reporting cost per task alongside eval scores, especially in a world where you can spend arbitrary test-time compute. This is one thing I like about the Artificial Analysis website - they include cost to run alongside their eval scores: https://artificialanalysis.ai/