LLMs exist on a logaritmhic performance/cost frontier. It's not really clear whether Opus 4.5+ represent a level shift on this frontier or just inhabits place on that curve which delivers higher performance, but at rapidly diminishing returns to inference cost.
To me, it is hard to reject this hypothesis today. The fact that Anthropic is rapidly trying to increase price may betray the fact that their recent lead is at the cost of dramatically higher operating costs. Their gross margins in this past quarter will be an important data point on this.
I think the tendency for graphs of model assessment to display the log of cost/tokens on the x axis (i.e. Artificial Analysis' site) has obscured this dynamic.
> It's not really clear whether Opus 4.5+ represent a level shift on this frontier or just inhabits place on that curve which delivers higher performance, but at rapidly diminishing returns to inference cost.
I think we're reaching the point where more developers need to start right-sizing the model and effort level to the task. It was easy to get comfortable with using the best model at the highest setting for everything for a while, but as the models continue to scale and reasoning token budgets grow, that's no longer a safe default unless you have unlimited budgets.
I welcome the idea of having multiple points on this curve that I can choose from. depending on the task. I'd welcome an option to have an even larger model that I could pull out for complex and important tasks, even if I had to let it run for 60 minutes in the background and made my entire 5-hour token quota disappear in one question.
I know not everyone wants this mental overhead, though. I predict we'll see more attempts at smart routing to different models depending on the task, along with the predictable complaints from everyone when the results are less than predictable.
They're also getting closer to IPO and have a growing user base. They can't justify losing a very large number of billions of other people's money in their IPO prospectus.
So there's a push for them to increase revenue per user, which brings us closer to the real cost of running these models.
This is a bad take. It's not really wrong in the sense that yes higher performance does cost more.
But it ignores completely the fact that the same intelligence is dropping by an order of magnitude (at least) every 12 months.
GPT o1 launched at $600/M output tokens and GPT4.5 launched at $150/M.
Opus 4.7 is $25/M for more intelligence
That sounds very plausible. But it implies they could offer even higher performance models at much higher costs if they chose to; and presumably they would if there were customers willing to pay. Is that the case? Surely there are a decent number of customers who’d be willing to pay more, much more, to get the very best LLMs possible.
Like, Apple computers are already quite pricey -- $1000 or $2000 or so for a decent one. But you can spec up one that’s a bit better (not really that much better) and they’ll charge you $10K, $20K, $30K. Some customers want that and many are willing to pay for it.
Is there an equivalent ultra-high-end LLM you can have if you’re willing to pay? Or does it not exist because it would cost too much to train?
I mean, the signs have been there that the costs to run and operate these models wasn't as simple as inference costs. And the signs were there (and, arguably, are still there) that it costs way, way more than many people like to claim on the part of Anthropic. So to me this price hike is not at all surprising. It was going to come eventually, and I suspect it's nowhere near over. It wouldn't surprise me if in 2-3 years the "max" plan is $800 or $2000 even.
Yeah. Combine this with much of Corpos right now using a “burn as many tokens as you need” policy on AI, the incentive is there for them to raise price and find an equilibrium point or at least reduce the bleed.
Once they implement their models directly in silicon, the cost will come down and the speed will go up. See Taalas.
heh adaptive thinking is letting the meter run itself. they make more when it runs longer.
[dead]
> The fact that Anthropic is rapidly trying to increase price may betray the fact that their recent lead is at the cost of dramatically higher operating costs.
Or they are just not willing to burn obscene levels of capital like OpenAI.
I meant reference Toby Ord's work here. I think his framing of the performance/cost frontier hasn't gotten enough attention https://www.tobyord.com/writing/hourly-costs-for-ai-agents