Have you ran models locally, especially on the phone? I have, and there are even apps like Google AI Edge Gallery that runs Gemma for you. It works perfectly fine for use cases like summarizing emails and such, you don't really need the latest and greatest (ie biggest) models for tasks like these, in much the same way more people do not need the latest and greatest phone or laptop for their use cases.
And anyway, you already see models like Qwen 3.5 9B and 4B beating 30B and 80B parameter models, which can already run on phones today, especially with quantization.
Benchmarks: https://huggingface.co/Qwen/Qwen3.5-4B
I'm going by what features Apple advertisement showed in the iPhone 16 ad. Take a phone out, and point at a restuarant and ask it to a) analyze the video/image b) understand what's going on
Or pull out the phone and ask "Who's the person I met on X day ..".