In the three options OP presents, I wonder if there's a fourth: BYO model
Customers give vendors metered access to their model. They can budget tokens per vendor. Vendors selling "AI products" can have a cleaner story and win on the margin.
The first step to is to iron out a reasonable protocol, basically authorizing a, access token, and then the model providers (OpenAI, Anthropic, etc.) do the rate limiting. Theoretically this could be done by OpenRouter too.
But even so - do customers want an "AI product" packaged cleanly, or do they want to manage token capacity? They may be forced to do the latter....
It could happen, but it seems "regressive" almost as most companies are completely not ready to build this muscle.