I feel like lots of people here are just commenting on the headline.
This isn't about the local models you're running on your old gaming rig, or the tesla p40 rig you build for local llm's.
This is about code leveraging the local resources where the code is running for it's AI needs. Rather than making an API call to an external AI service, the code leverages the AI capabilities built into the hardware it runs on. With modern Apple, Intel, and AMD silicon all shipping dedicated AI acceleration, this is the where IMO the focus should be heading.
How many Flops or whatever can your phone do? I bet it's enough to paint the walls of your living room, or draw a pretty good pelican on a bike.
Actually you can do way more things than that. We have optimized it to process 2TB of high def videos on a M5 MBP in under 24 hours, including everything such as speech understanding, face recog, LLM and VLM. Super fun.
> draw a pretty good pelican on a bike.
You mean the famously hard task? The one picked because it stretches frontier models to their limits?
A phone makes a very crappy AI inference rig. It's battery powered and can't even really run at 100% utilization on an ongoing basis due to how challenging the thermals are.
I was writing just about this last week for fun: AI + hardware team-up to build localized AI with specialized functions to your organization. Ex. Adoble Studio AI in an on premise Box, made by Apple and powered by something like Cohere with privacy:
https://www.notion.so/adeelkhamisa/Cohere-s-next-steps-to-be...
I just did something exactly like this. I have a self-hosted personal dashboard and one of the APIs I'm reading gives slightly too verbose of an output. So I added a feature to summarize the text using Qwen 3.5 2B which happily runs on a CPU. I've never clocked the tokens per second because I only generate like 100 tokens an hour in a very narrow domain of knowledge and speed isn't critical.
And this is exactly what the LLM provider industry is fighting tooth-and-nail. It’s not only because it doesn’t directly contribute to their bottom line, it also directly opposes the idea that LLMs are going to replace entire workers rather than enhance the abilities of individual workers. What we’re headed towards would have been a killer product and probably still shifted a bunch of capital to the bazillionaires had these companies set more realistic goals rather than banking that they’d be the ones that won the war that “changed everythingTM”.