This is not true. API tokens are not sold at a loss, and hardware gets more efficient over time, so ...

rprend • yesterday at 5:01 PM • 1 reply • view on HN

This is not true. API tokens are not sold at a loss, and hardware gets more efficient over time, so serving inference on the same model gets cheaper. LLAMA 3.1 405B parameters was $6/$12/M tokens in 2024, but in 2026 that same model is $3/$3/M tokens.

The most intelligent model at a given time is much larger than the previous, which is why token costs for GPT5.5 are higher than 5.4. But you should expect that 2 years from now, serving a GPT5.5 sized model will be cheaper than GPT5.5 today. You should expect it to be even cheaper to get an equally intelligent model 2 years from now, because distillation techniques are effective at reducing the necessary parameter count for the same benchmark scores.

Replies

eikenberry • yesterday at 7:55 PM

So are they going to stop at GPT 5.5? This analysis only seems to be counting inference cost when the majority of the cost, and why they are burning through money, is the training.

alt Hacker News

Replies