On-device models are the future. Users prefer them. No privacy issues. No dealing with connectivity,...

abu_ameena • today at 2:01 PM • 9 replies • view on HN

On-device models are the future. Users prefer them. No privacy issues. No dealing with connectivity, tokens, or changes to vendors implementations. I have an app using Foundation Model, and it works great. I only wish I could backport it to pre macOS 26 versions.

Replies

raw_anon_1111 • today at 2:26 PM

Users don’t care about “privacy”. If they did, Meta and Alphabet wouldn’t be worth $1T+.

Users really don’t matter at all. The revenue for AI companies will be B2B where the user is not the customer - including coding agents. Most people don’t even use computers as their primary “computing device” and most people are buying crappy low end Android phones - no I’m not saying all Android phones are crappy. But that’s what most people are buying with the average selling price of an Android phone being $300.

➕ show 10 replies

sowbug • today at 5:26 PM

I am concerned that local models will never benefit from the training on live requests that is surely improving cloud-only models.

This might be the cost of privacy, and it might be worth paying, unless cloud models reach an inflection point that make local models archaic.

mrinterweb • today at 5:08 PM

I think two recent advances make your statement more true. The new Qwen 3.5 series has shown a relatively high intelligence density, and Google's new turboquant could result in dramatically smaller/efficient models without the normal quantization accuracy tradeoff.

I would expect consumer inference ASIC chips will emerge when model developments start plateauing, and "baking" a highly capable and dense model to a chip makes economic sense.

➕ show 1 reply

whazor • today at 6:20 PM

Obviously hardware wise the real blocker is memory cost. But there is no reason why future devices couldn't bundle 256GB of mem by default.

➕ show 1 reply

jesse23 • today at 4:11 PM

Yes so far do we have a working practice that, with a given local mode, any infra we could use, that provide a good practice that can leverage it for local task?

thefourthchime • today at 4:25 PM

Maybe some more distant future. For me, I'm still struggling with the hallucinations and screw-ups that the state-of-the-art models give me.

mgaunard • today at 5:17 PM

These local models are far behind the capabilities of latest Gemini Pro, Claude Opus or GPT.

Why waste time with subpar AI?

➕ show 4 replies

throwawayq3423 • today at 6:15 PM

Technologists make the same mistake over and over in thinking the better technology will win. vhs vs betamax, etc.

Actual consumers not only don't care, they will not even be aware of the difference.

testing22321 • today at 2:44 PM

I see all these LLM posts about if a certain model can run locally on certain hardware and I don’t get it.

What are you doing with these local models that run at x tokens/sec.

Do you have the equivalent of ChatGPT running entirely locally? What do you do with it? Why? I honestly don’t understand the point or use case.

➕ show 3 replies

alt Hacker News

Replies