This is where I see the economy of AI going:
* Inference becomes cheap
- speciality accelerators hit the market and race to the bottom begins
* Training remains expensive
- This works out for Anthropic/OpenAI, they go into the business of training
* Models become rental units or purchasable assets, you run on inference hardware
- Rent or own inference hardware
* Or you pay someone to do all of the above for you, at a premium
There's no magic bullet for inference on cheap accelerators. Any accelerator will still require large amounts of high bandwidth memory.