logoalt Hacker News

Are the costs of AI agents also rising exponentially? (2025)

159 pointsby louiereedersonlast Wednesday at 1:47 PM38 commentsview on HN

Comments

easygenestoday at 5:20 AM

While I understand why they used the METR data, a cleaner look would be against the current cost-optimal frontier of open models (e.g. GLM-5.1 and MiniMax-M2.7). That paints a very different picture. Comparing just the frontier models at the time of the METR report invariably leads to looking at providers who are pushing the limits of cost at the time of the report.

GPT-5 was shown as being on the costly end, surpassed by o3 at over $100/hr. I can't directly compare to METR's metrics, but a good proxy is the cost of the Artificial Analysis suite. GLM-5.1 is less than half the cost to complete the suite of GPT-5 and is dramatically more capable than both GPT-5 and o3.

So while their analysis is interesting, it points towards the frontier continuing to test the limits of acceptable pricing (as Mythos is clearly reinforcing) and the lagging 6-12 months of distillation and refinement continuing to bring the cost of comparable capabilities to much more reasonable levels.

thelastgallontoday at 1:31 AM

> On many task lengths (including those near their plateau) they cost 10 to 100 times as much per hour. For instance, Grok 4 is at $0.40 per hour at its sweet spot, but $13 per hour at the start of its final plateau. GPT-5 is about $13 per hour for tasks that take about 45 minutes, but $120 per hour for tasks that take 2 hours. And o3 actually costs $350 per hour (more than the human price) to achieve tasks at its full 1.5 hour task horizon. This is a lot of money to pay for an agent that fails at the task you’ve just paid for 50% of the time — especially in cases where failure is much worse than not having tried at all.

show 3 replies
dangyesterday at 9:42 PM

Related ongoing thread:

Measuring Claude 4.7's tokenizer costs - https://news.ycombinator.com/item?id=47807006 (309 comments)

quicklywilliamtoday at 12:16 AM

Interesting read. I don't know if I quite buy the evidence, but it's definitely enough to warrant further investigation. It also matches up with my personal experience, which is that tools like Claude Code are burning through more and more tokens as we push them to do bigger and bigger work. But we all know the frontier model companies are burning through money in an unsustainable race to get you and your company hooked on their tools.

So: I buy that the cost of frontier performance is going up exponentially, but that doesn't mean there is a fundamental link. We also know that benchmark performance of much smaller/cheaper models has been increasing (as far as I know METR only looks at frontier models), so that makes me wonder if the exponential cost/time horizon relationship is only for the frontier models.

agentifyshtoday at 12:42 AM

Until there is some drastic new hardware, we are going to see a similar situation to proof of work, where a small group hordes the hardware and can collude on prices.

Difference is that the current prices have a lot of subsidies from OPM

Once the narrative changes to something more realistic, I can see prices increase across the board, I mean forget $200/month for codex pro, expect $1000/month or something similar.

So its a race between new supply of hardware with new paradigm shifts that can hit market vs tide going out in the financial markets.

show 1 reply
greenmilkyesterday at 11:26 PM

Are any inference providers currently making profit (on inference, I know google makes money)?

show 6 replies
matt3210today at 1:02 AM

I took a month break and my side project took 2x as much tokens

siliconc0wtoday at 3:08 AM

Working on a oss tool to help orgs identify where they can save on token costs: https://repogauge.org

Happy to run it on your repos for a free report: [email protected]

noosphrtoday at 2:50 AM

Yet again: Transformers are fundamentally quadratic.

If they can do a task that takes 1 unit of computation for 1 dollar they will cost 100 dollars for a 10 unit task and 10,000 for a 100 unit task.

Project costs from Claude Code bear this out in the real world.

srslyTrying2hlpyesterday at 11:28 PM

[dead]

totalmarkdownyesterday at 11:25 PM

[flagged]