The question isn’t whether it works (it does); the question is whether there are buyers for hardware that is obsolete the day it ships. Models evolve much more quickly than hardware can keep up.
One obvious use case is edge computing, such as in industrial applications that cannot tolerate the risk of a network link or cloud service going down. Even embedded use cases are possible, such as an image classifier model in a security camera.
Right, but there exist problems that need to be routinely solved and can be solved on glm 5.2. is the model state of the art when it is published? No. But when it comes out you could optimize it and let your solver run forever for quite cheap, and that could be useful if the only problems you want it to solve (for cheap) are solvable by that model.
And the high water mark of what can be solved by open models will keep going up.
There may be all sorts of stable use case models that this could be interesting for. Imagine permanent voice translation circuits at a tiny fraction of the current price, glasses that subtitle the world with long battery life.
The models have to run on something or they're useless. They can't run on future hardware today, and people want to use models today. So, if hardware is obsolete the day it ships, we're all using obsolete hardware, and there's no alternative to that.
They are betting on fast release cycles coupled with much lower costs (purchase and operations) mixed with the ability to have dynamic fine tunes on top of the static model.
Presumably at some point the rapid progress of models will plateau, at least insofar as a model could be frozen in time and remain economically useful for the expected life of hardware. Especially if it comes with compelling benefits e.g. dramatically lower latency and/or dramatically higher performance per watt.
If you can build chips that could run one specific LLM 100x faster than anything else, it would have a use case that nothing else could match.