logoalt Hacker News

Someone1234yesterday at 5:52 PM10 repliesview on HN

Does anyone with more insight into the AI/LLM industry happen to know if the cost to run them in normal user-workflows is falling? The reason I'm asking is because "agent teams" while a cool concept, it largely constrained by the economics of running multiple LLM agents (i.e. plans/API calls that make this practical at scale are expensive).

A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers, and I don't know if that has changed with more efficient hardware/software improvements/caching.


Replies

simonwyesterday at 6:01 PM

The cost per token served has been falling steadily over the past few years across basically all of the providers. OpenAI dropped the price they charged for o3 to 1/5th of what it was in June last year thanks to "engineers optimizing inferencing", and plenty of other providers have found cost savings too.

Turns out there was a lot of low-hanging fruit in terms of inference optimization that hadn't been plucked yet.

> A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers

Where did you hear that? It doesn't match my mental model of how this has played out.

show 4 replies
Aurornisyesterday at 6:20 PM

> A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers

This gets repeated everywhere but I don't think it's true.

The company is unprofitable overall, but I don't see any reason to believe that their per-token inference costs are below the marginal cost of computing those tokens.

It is true that the company is unprofitable overall when you account for R&D spend, compensation, training, and everything else. This is a deliberate choice that every heavily funded startup should be making, otherwise you're wasting the investment money. That's precisely what the investment money is for.

However I don't think using their API and paying for tokens has negative value for the company. We can compare to models like DeepSeek where providers can charge a fraction of the price of OpenAI tokens and still be profitable. OpenAI's inference costs are going to be higher, but they're charging such a high premium that it's hard to believe they're losing money on each token sold. I think every token paid for moves them incrementally closer to profitability, not away from it.

show 2 replies
nodjayesterday at 8:08 PM

> A year or more ago, I read that both Anthropic and OpenAI were losing money on every single request even for their paid subscribers, and I don't know if that has changed with more efficient hardware/software improvements/caching.

This is obviously not true, you can use real data and common sense.

Just look up a similar sized open weights model on openrouter and compare the prices. You'll note the similar sized model is often much cheaper than what anthropic/openai provide.

Example: Let's compare claude 4 models with deepseek. Claude 4 is ~400B params so it's best to compare with something like deepseek V3 which is 680B params.

Even if we compare the cheapest claude model to the most expensive deepseek provider we have claude charging $1/M for input and $5/M for output, while deepseek providers charge $0.4/M and $1.2/M, a fifth of the price, you can get it as cheap as $.27 input $0.4 output.

As you can see, even if we skew things overly in favor of claude, the story is clear, claude token prices are much higher than they could've been. The difference in prices is because anthropic also needs to pay for training costs, while openrouter providers just need to worry on making serving models profitable. Deepseek is also not as capable as claude which also puts down pressure on the prices.

There's still a chance that anthropic/openai models are losing money on inference, if for example they're somehow much larger than expected, the 400B param number is not official, just speculative from how it performs, this is only taking into account API prices, subscriptions and free user will of course skew the real profitability numbers, etc.

Price sources:

https://openrouter.ai/deepseek/deepseek-v3.2-speciale

https://claude.com/pricing#api

show 1 reply
Havocyesterday at 6:01 PM

Saw a comment earlier today about google seeing a big (50%+) fall in Gemini serving cost per unit across 2025 but can’t find it now. Was either here or on Reddit

show 1 reply
m101yesterday at 8:02 PM

I think actually working out whether they are losing money is extremely difficult for current models but you can look backwards. The big uncertainties are:

1) how do you depreciate a new model? What is its useful life? (Only know this once you deprecate it)

2) how do you depreciate your hardware over the period you trained this model? Another big unknown and not known until you finally write the hardware off.

The easy thing to calculate is whether you are making money actually serving the model. And the answer is almost certainly yes they are making money from this perspective, but that’s missing a large part of the cost and is therefore wrong.

3abitonyesterday at 6:01 PM

It's not just that. Everyone is complacent with the utilization of AI agents. I have been using AI for coding for quite a while, and most of my "wasted" time is correcting its trajectory and guiding it through the thinking process. It's very fast iterations but it can easily go off track. Claude's family are pretty good at doing chained task, but still once the task becomes too big context wise, it's impossible to get back on track. Cost wise, it's cheaper than hiring skilled people, that's for sure.

show 1 reply
zozbot234yesterday at 6:02 PM

> i.e. plans/API calls that make this practical at scale are expensive

Local AI's make agent workflows a whole lot more practical. Making the initial investment for a good homelab/on-prem facility will effectively become a no-brainer given the advantages on privacy and reliability, and you don't have to fear rugpulls or VC's playing the "lose money on every request" game since you know exactly how much you're paying in power costs for your overall load.

show 2 replies
KaiserProyesterday at 7:05 PM

Gemini-pro-preview is on ollama and requires h100 which is ~$15-30k. Google are charging $3 a million tokens. Supposedly its capable of generating between 1 and 12 million tokens an hour.

Which is profitable. but not by much.

show 1 reply
Bombthecatyesterday at 6:34 PM

That's why anthropic switched to tpu, you can sell at cost.

WarmWashyesterday at 7:19 PM

These are intro prices.

This is all straight out of the playbook. Get everyone hooked on your product by being cheap and generous.

Raise the price to backpay what you gave away plus cover current expenses and profits.

In no way shape or form should people think these $20/mo plans are going to be the norm. From OpenAI's marketing plan, and a general 5-10 year ROI horizon for AI investment, we should expect AI use to cost $60-80/mo per user.

show 1 reply