The current AI pricing was always going to go away

67 points • by arnon • today at 11:24 AM • 82 comments • view on HN

Comments

This is where open source models are important.

The latest deepseek v4 pro model is 2-5x cheaper than Claude Sonnet 4.6. Cursor's Compose 2.5 that was just recently released is 6x cheaper than Sonnet.

The state of the art models are going to get better and more expensive and smaller models are going to get cheaper.

There will be a point where the intelligence of both the cheap and state of the art models are indistinguishable by humans like it is indistinguishable for me to understand the difference the difference between Terrance Tao and my university math professor.

I don't always need the smartest and most expensive models. I will need it every once in awhile and will gladly pay that price if I had to. What I do need is the model that will solve the current problem I have in a reasonable amount of time.

➕ show 9 replies

_fat_santa • today at 3:28 PM

I wonder how much of Uber blowing their AI budget and MSFT pulling their claude code licenses can be attributed to "tokenmaxxing".

When Meta announced token leaderboards and other followed, I could see this being the logical conclusion. That whole trend is so dumb because it leads to this.

Company announces they will measure developer performance by how many tokens they burn and constantly talks about how the best developers burn the most tokens. Developers see the message and start burning tokens. And then the company acts surprised when their bills go through the roof.

I personally use my OpenAI subscription pretty heavily, 2-3 agents running practically all day on various tasks but I never even get close to running into limits while I hear about others blowing through limits on multiple accounts in the same time period. I'm convinced that most of those folks and their elaborate workflows aren't really for productivity but for bragging rights about how much they use AI.

➕ show 5 replies

tekacs • today at 4:24 PM

> Anthropic’s CFO testified under oath this March that the company spent $10 billion on compute and made $5 billion in revenue (Ed Zitron has the math). The labs are underwater on inference. They’re raising prices to keep the lights on.

'The labs are underwater on inference' is an absurd thing to say whilst not separating the cost of _compute_ out into training and inference.

➕ show 3 replies

extr • today at 3:48 PM

What is the OP talking about. $/unit intelligence is going down rapidly. You can achieve what would have been considered miracles in 2022 with < $10.

➕ show 2 replies

abtinf • today at 3:47 PM

Insofar as I can tell, inference is on a certain path toward becoming "free". The models are now extremely powerful on high-end consumer hardware, and the efficiency trend seems likely to continue.

Here is a recent non-rigorous benchmark I ran against a bunch of models. Qwen3.6 35B A3B fine-tuned with opus data runs plenty fast on my local machine and produce outstanding results - easily in the top 5, comparable to GPT 5.5 Pro (which is $180/mtok).

https://gistpreview.github.io/?31d66ef69e4aed3efae1aec69d86c...

I've predicted for years now that the industry will head down the path of the virus scanning vendors: selling subscriptions to be able to download the latest versions of models. I simply don't see how any other business model is remotely viable, except at the very highest end of inference or video gen.

➕ show 1 reply

ibtheory • today at 6:34 PM

also will bring up some good opportunities in the optimization space. Smaller and cheaper models + optimization can bring performance up, especially in certain domain specific applications of ai.

koliber • today at 4:42 PM

EDIT: [ IGNORE THIS COMMENT -- IT IS WRONG - I had a "bad math moment" myself ]

The math seems off. How is 7.8 million vs 4 million 95% more expensive. Article makes good points but I doubt the numbers as they don’t add up.

Still agree with the conclusion though.

➕ show 1 reply

shay_ker • today at 4:37 PM

In the three options OP presents, I wonder if there's a fourth: BYO model

Customers give vendors metered access to their model. They can budget tokens per vendor. Vendors selling "AI products" can have a cleaner story and win on the margin.

The first step to is to iron out a reasonable protocol, basically authorizing a, access token, and then the model providers (OpenAI, Anthropic, etc.) do the rate limiting. Theoretically this could be done by OpenRouter too.

But even so - do customers want an "AI product" packaged cleanly, or do they want to manage token capacity? They may be forced to do the latter....

➕ show 1 reply

alligatorplum • today at 4:05 PM

I seldom use my PC anymore ever since i got a laptop. with the cost per token increasing along with the random "features" where models will just eat through your tokens in one hour. I really have been tempted to turn my PC into a server to run local models on there

mark_l_watson • today at 5:00 PM

> "which use cases earn the inference cost they burn?"

That is the question. I love using OpenCode with paid inference providers and seeing the cost of every little thing I do. On the other hand, right now I am flipping between Antigravity CLI and the two Antigravity apps burning Claude Opus tokens like crazy, knocking off a ton of work. Google must be losing money on me.

hereme888 • today at 5:31 PM

NVIDIA’s published specs imply much larger gains in NVFP4 inference compute and GPU memory bandwidth than in BOM cost.

That said, more intelligence and automation = higher costs.

infecto • today at 3:50 PM

Has this not been true for a long time now? Most companies have had enterprise/business level prices that was highly connected to usage for a what feels like at least a year.

anonymousiam • today at 4:37 PM

Not mentioned in the article/blog was the local alternative. Many applications will run just fine locally and not in the cloud. This is also more secure. Running local will probably eventually become the norm. It makes me wonder about the future of all these VC funded AI companies...

throwa356262 • today at 3:06 PM

This is only true if your world is limited to openai, antropic and alike.

There are a whole bunch of companies somewhere else in the world that are getting better and cheaper every month, hardware side included. all without the infinite VC money

xnx • today at 5:19 PM

Lost me at "Ed Zitron has the math"

energy123 • today at 4:10 PM

Capex and revenue should not be compared like this, unless revenue is small and not growing.

dtagames • today at 2:47 PM

Some of these coming price increases will move dev work back to dedicated shops and teams when individuals and non-devs won't want to pay the AI bill to finish and ship their projects.

An outside small dev shop or internal dev team can pay these prices and spread the cost over several customers or departments, but the era of giving everyone AI and telling them to dev stuff is about to be over.

MarkusQ • today at 4:55 PM

This is just wrong.

The pricing so far has been a classic case of loss-leader to build market share and ramp up until you can find a moat. Normally, the huge cost of training would provide such a moat, or the amount of training data required, but both of those seem to have been overcome by enough players to keep the ball in play. The next target to keep out the riffraff seems to be "Gigawatts of Data Center" (gack, I hate that metric!) and you might think that it would hold, given the finite size of the planet.

But in space, no one can hear you bleed cash, so...

Havoc • today at 3:45 PM

Inference costs absolutely did fall. And even more so when looking at intelligence it buys you.

eg compare say gpt 3.5 to latest deepseek. Both cheaper and more at more capable

➕ show 1 reply

pacman1337 • today at 4:00 PM

I get similar results for deepseek and opus but opus is way faster. I guess deepseek streams thinking and makes it slower?

plaidfuji • today at 3:59 PM

kind of sobering to realize that whether your job can be profitably automated away comes down to what $/token some hyperscale AI provider can deliver… I suppose it’s nice that this article highlights some upward pressure on that number.

DeathArrow • today at 5:36 PM

I use cheap Chinese models. For all I care, both OpenAI and Anthropic can raise their prices until they'll have no customers left.

kittikitti • today at 5:10 PM

Thank you for sharing this article. I think the graphs in it were useful in understanding the different pricing structures. One thing that I would have included is pricing based on AI that I own, through capital expenditure (CapEx).

However, it's much harder to compare. For one, the cost per token is difficult to measure until a sufficient amount of time has passed so that an extrapolation is more accurate. Also, there are performance considerations where a local solution might be more or less accurate than an equivalent online AI. In addition, the reduced compliance risk is hard to quantify or it makes online AI practically useless.

I don't understand how people got buy-in for a business model that assumed token costs would go down indefinitely. All tech startups follow a blitz-scaling pattern where they practically give away their services for free, trap customers in a moat, and then extort as much money as they can.

fallpeak • today at 3:28 PM

This is slightly more tasteful slop than average (I'm thinking probably Claude rather than ChatGPT?), but it's still 100% AI written: https://www.pangram.com/history/c55ab69b-e0a9-49a0-8056-2fcd...

➕ show 2 replies

yogthos • today at 4:39 PM

My expectation is that local models will be the default for coding within a year or two. You can already run Qwen 3.6 with MTP at a pretty reasonable speed without needing a huge amount of VRAM. And while it's not as good as current frontier models, it's already quite competent for a lot of tasks.

And there's no sign that people are running out of ideas for how to optimize models further. You see a bunch of papers come out literally every few weeks right now. So, it's entirely plausible to me that we'll see models that are superior to current frontier ones in a year or two that will run on your machine.

Once we get to that point, I don't think it's even going to matter if frontier models keep improving for most people. Being able to run the model on your machine, use it as much as you want in any way you want, without having to worry about it changing from under you or the company changing pricing, and not have to send all your data to the vendor are going to be the deciding factors.

At some point the models are just good enough to do what you need to do. On top of that, I expect tooling around models and coding patterns will evolve as well. That could compensate significantly for the capabilities of the model. We already see this happening with two prime examples here:

https://github.com/itigges22/ATLAS

https://arxiv.org/abs/2509.16198

anthonypasq • today at 4:12 PM

Guys, we are the in the mainframe era of AI. People in the 60's thought computing was expensive too and the idea of having a computer on every desk, nevermind every pocket, nevermind every single piece of electronics in the world basically seemed like a complete pipe dream.

if you told someone in the 70's their toaster would have a supercomputer it in, they would think you were crazy. in 10 years your doorknob is going to have a local AI model it in.

This is computing 2.0 not the dot com bubble. 90% of inference will be at the edge in the future and there will still be super-computers and giant clusters doing cutting edge science and research, but for 90% of use cases youll just need a tiny local model, same reason you dont need a giant GPU in your smart tv.

➕ show 1 reply

alfiedotwtf • today at 4:06 PM

> Memory for 4x expensive

> Did we collectively forget second order thinking?

I bought 2x 16Gb NVIDIA cards this week because I don’t see hardware getting cheaper anytime soon, and because of that I totally don’t see the point of “waiting until prices go lower for graphics cards” because that might not for a long time yet!

In fact, if you include factoring in world events (and the ones that haven’t happened yet but eventually will e.g. China’s 2027 long planned take of Taiwan), then there’s no way graphics prices are going to be accessible to mere mortals until at least 2028.

But my real reasoning is that you’re going to see a flood of OpenAI and Anthropic users leave because of a) increasing pricing plans, and b) impeding business laws on the horizon about protecting sovereign data from AI (i.e data in cloud for training is a no no).

So what happens when people and companies one by one start leaving the SOTA AI cloud for from-good-enough-to-wow models? RAM and graphics cards become the new toilet paper, which is going to double again current prices.

Upgrade now before it’s too late folks!

YetAnotherNick • today at 3:53 PM

You are comparing two different model. It's like saying roadster is more expensive than model S. No model pricing actually increased, and I am using GPT-4o in the same price as it was before.

You can see price vs performance in artificial analysis and the the pareto optimal is all just 6 months old model.

adamesque • today at 3:31 PM

It's hard to take this piece seriously if he's citing _Ed Zitron's_ math, and equally hard to make the blanket statement that flat-rate plans = "the current AI pricing". But yes, those pricing models were pretty silly and unsustainable.

➕ show 1 reply

paralleliq • today at 4:40 PM

[dead]

vitalysemenov • today at 3:55 PM

[flagged]

vdelpuerto • today at 12:08 PM

[dead]

alt Hacker News

The current AI pricing was always going to go away

Comments