At least tokens are equivalent to measuring 'thinking'... I wouldn't mind if it burned 100k tokens to output a one line change to fix a bug.
The problem is maximizing code generated per token spent. This model of "efficiency" is fundamentally broken.