logoalt Hacker News

wongarsuyesterday at 1:44 PM5 repliesview on HN

I can get Kimi K2.5 inference on openrouter for about $0.5/MTok input + $2.5/MTok output, from six providers that have no moat besides efficiently selling GPU time. We can assume they are doing so at a profit (they have no incentive to do this at a loss), giving us those numbers as the cost to serve a 1T-a32b model at scale.

Now we don't know the true size of any of the proprietary models, but my educated guess is that Sonnet is in about the same parameter range, just with better training and much better fine tuning and RLHF. Yet API pricing for Sonnet is $3/MTok input + $15/MTok output, exactly six times as expensive. Even Haiku is twice as expensive as Kimi K2.5.

I find it difficult to believe in a world where those API prices aren't profitable. For subscription pricing it's harder to tell. We hear about those that get insane value out of their subscription, but there has to be a large mass who never reaches their limits. With company-wide rollouts there might even be a lot of subscription users who consume virtually no tokens at all.


Replies

yobboyesterday at 2:05 PM

> We can assume they are doing so at a profit

This is false. We may assume it's the most efficient way of generating revenue given their GPUs, but their overall profitability will just be a guess. They would still have incentives to run hardware at maximum, even when it's uncertain to eventually recoup costs.

> a world where those API prices aren't profitable

A lab with employees and models in training has other costs than the operating expenses of a GPU farm.

show 2 replies
FiberBundleyesterday at 3:04 PM

This is like saying that innovative medical drugs could be sold at a profit if only there was no patent protection and the innovative companies would still invest in R&D. Yes, on a token level pure inference costs might be profitable, but the frontier Ai labs will surely have to recoup their R&D investments at some point.

jerojeroyesterday at 1:49 PM

Companies doing foundational models need to cover the cost of training which is much more expensive than training something like kimi.

show 2 replies
ZitchDogyesterday at 2:06 PM

> they have no incentive to do this at a loss

Are you sure? Surely there is a lot of interesting data in those LLM interactions.

show 1 reply
KallDrexxyesterday at 4:20 PM

The problem I have with this analysis is it's missing the multi-dimensional aspect of "is this profitable".

It's fair to say that if all these operators are competing for tokens, that the OpenRouter token operator (not sure the exact phrase but the people running the models) are accounting for some level of margin.

However, how many of these are running their own data centers and GPUs?

If they are running their own infrastructure, then it's not a simple equation of if each specific token set is profitable, since it needs to account for the cost of running the data center. It could be that they believe that it is profitable in the long term by utilizing the long tail of asset depreciation, but that isn't guaranteed.

IF they aren't running their own infrastructure, then it's much easier to claim that it's profitable and has a margin (outside of running their servers to manage the rented infrastructure).

HOWEVER, a lot of data centers have some pretty crazy low prices for GPUs that may be vying for user base and revenue over profitability. In these cases, if data center growth starts slowing due to slower buildout then it's very likely GPU prices go up and inference stops becoming profitable for the open router owners.

So long term it's not clear how profitable even these open models are.

OpenAI and Anthropic definitely fall into the latter category too. Their infrastructure requirements are much higher than the open models, and they are being given huge discounts so Microsoft/Amazon/Google can all claim revenue (since they have profitability coming from other parts). It's not clear if OpenAI and Anthropic models would be profitable at inference if they were paying rates that cloud hosts would make a profit from.

There's just way too many dimensions to this scenario to flat out state that open router proves inference is profitable at scale.