> it shouldn't cost them more money
As things are currently, better models mean bigger models that take more storage+RAM+CPU, or just spend more time processing a request. All this translates to higher costs, and may be mitigated by particular configs triggered by knowledge that a given client, providing particular guarantees, is on the other side.
That’s kind of the point. Even if users can choose which model to use (and apparently the default is the largest one), they could still say (For roughly the same cost): your Opus quota is X, your Haiku quota is Y, go ham. We’ll throttle you when you hit the limit.