On-device models are the future. Users prefer them. No privacy issues. No dealing with connectivity, tokens, or changes to vendors implementations. I have an app using Foundation Model, and it works great. I only wish I could backport it to pre macOS 26 versions.
I am concerned that local models will never benefit from the training on live requests that is surely improving cloud-only models.
This might be the cost of privacy, and it might be worth paying, unless cloud models reach an inflection point that make local models archaic.
I think two recent advances make your statement more true. The new Qwen 3.5 series has shown a relatively high intelligence density, and Google's new turboquant could result in dramatically smaller/efficient models without the normal quantization accuracy tradeoff.
I would expect consumer inference ASIC chips will emerge when model developments start plateauing, and "baking" a highly capable and dense model to a chip makes economic sense.
Obviously hardware wise the real blocker is memory cost. But there is no reason why future devices couldn't bundle 256GB of mem by default.
Yes so far do we have a working practice that, with a given local mode, any infra we could use, that provide a good practice that can leverage it for local task?
Maybe some more distant future. For me, I'm still struggling with the hallucinations and screw-ups that the state-of-the-art models give me.
These local models are far behind the capabilities of latest Gemini Pro, Claude Opus or GPT.
Why waste time with subpar AI?
Technologists make the same mistake over and over in thinking the better technology will win. vhs vs betamax, etc.
Actual consumers not only don't care, they will not even be aware of the difference.
I see all these LLM posts about if a certain model can run locally on certain hardware and I don’t get it.
What are you doing with these local models that run at x tokens/sec.
Do you have the equivalent of ChatGPT running entirely locally? What do you do with it? Why? I honestly don’t understand the point or use case.
Users don’t care about “privacy”. If they did, Meta and Alphabet wouldn’t be worth $1T+.
Users really don’t matter at all. The revenue for AI companies will be B2B where the user is not the customer - including coding agents. Most people don’t even use computers as their primary “computing device” and most people are buying crappy low end Android phones - no I’m not saying all Android phones are crappy. But that’s what most people are buying with the average selling price of an Android phone being $300.