I think there’s a realistic chance consumer inference moves on-device. I think it really depends on marketing.
My non-tech friends and family would probably be served perfectly fine by local models today, if they had a working web search tool. Their queries are often “soft” and don’t have an exact answer. My mom and aunt used it to pick a hairstyle, my mom used it to get an image of what a room would look like with particular drapes in it, etc. Stuff I think mid-sized local models like Gemma or smaller Qwens could do without issue. They just don’t have a device that will run them.
Businesses won’t move. They need a huge context so they can stuff a bunch of Confluence pages in it and 300 tools and it needs to read an entire codebase and yada yada. The hardware depreciation and electricity will probably make it a net zero or even cost more than paying for API access.
The economic argument in favor of cloud inference: higher utilization is always going to have a ROI for inference hardware.
But maybe that hardware becomes so commoditized that it's not difficult to obtain / stuff in a box.