Users have to pay for the compute somehow. Maybe by paying for models run in datacenters. Maybe paying for hardware that's capable enough to run models locally.
But also: if Apple's way works, it’s incredibly wasteful.
Server side means shared resources, shared upgrades and shared costs. The privacy aspect matters, but at what cost?
I can upgrade to a bigger LLM I use through an API with one click. If it runs on my device device I need to buy a new phone.