Serving a single user is likely not profitable, but total throughput rises a lot when serving many concurrent users, because the same weights can be used to generate tokens for all users at once, which increases efficiency.
Also, a lot of money is being made on input tokens and cached tokens, which are much cheaper to compute.
DeepSeek published their math for serving the V3/R1 models. They were 535% profitable: https://github.com/deepseek-ai/open-infra-index/blob/main/20...