Why? Well it depends, most evidence is suggesting that Anthropic and OpenAI are making a lot of money on inference so the question is whether its more profitable for them to sell 100X tokens for Y, or 1X tokens for 100Y. In most industries with high fixed costs and low variable costs and unlimited scalability (like LLM providers) the first option ends up being much more profitable
Literally nobody is making money on inference