Look a cost per intelligence or cost per task instead of cost per token.
How do I reliably measure 1 unit of intelligence?
Isn't the outcome / solution for a given task non-deterministic? So can we reliably measure that?
How do I reliably measure 1 unit of intelligence?