> The state of the art models are going to get better and more expensive and smaller models are going to get cheaper.
Why do you think this will be true?
Right now I see the major US labs betting on gaining an advantage from having way more compute, and I see Chinese labs competing with one another in a resource-scarce environment, so they place much more emphasis on compute-efficiency.
But the supply chains that feed into the massive data center growth in the US are strained; there are energy, memory, and logistical bottlenecks to name a few.
In the medium-long run, compute capacity will not grow exponentially forever. Somehow it has for decades, but there can be no infinite exponential growth, and that point may be when the planet really starts to cook itself.
Maybe the US labs will become more compute-constrained, and then have to compete on efficiency.
Or maybe things change fundamentally in some other way I'm not thinking of.
Commoditize your complement - I expect to see this most in consumer AI (after that starts actually working...)
It will be important for Apple to have good enough, cheap local LLM models that run on-device.
If the barrier to performance shifts from fundamental model capability to context collection and management I would expect to see folks focused on that problem continuing to drive open-weight LLM model development in some shape or form.
>so they place much more emphasis on compute-efficiency.
Maybe on training, but on inference they use more tokens than comparable western models.
https://artificialanalysis.ai/?output-tokens=intelligence-vs...
The labs have a perverse incentive to make things as expensive compute wise as possible. The only thing keeping this somewhat in check is competition, but it's intentionally being gatekept by locking up the supply of computing infrastructure. With 3 players it's pretty easy to collude even if indirectly. They can't burn trillions forever. Nvidia's 75% profit margins are not sustainable forever.
Things will normalize, but it will take time.