I feel like lots of people here are just commenting on the headline. This isn't about the loc...

Akuehne • yesterday at 12:42 PM • 7 replies • view on HN

I feel like lots of people here are just commenting on the headline.

This isn't about the local models you're running on your old gaming rig, or the tesla p40 rig you build for local llm's.

This is about code leveraging the local resources where the code is running for it's AI needs. Rather than making an API call to an external AI service, the code leverages the AI capabilities built into the hardware it runs on. With modern Apple, Intel, and AMD silicon all shipping dedicated AI acceleration, this is the where IMO the focus should be heading.

How many Flops or whatever can your phone do? I bet it's enough to paint the walls of your living room, or draw a pretty good pelican on a bike.

Replies

DrewADesign • yesterday at 2:00 PM

And this is exactly what the LLM provider industry is fighting tooth-and-nail. It’s not only because it doesn’t directly contribute to their bottom line, it also directly opposes the idea that LLMs are going to replace entire workers rather than enhance the abilities of individual workers. What we’re headed towards would have been a killer product and probably still shifted a bunch of capital to the bazillionaires had these companies set more realistic goals rather than banking that they’d be the ones that won the war that “changed everythingTM”.

➕ show 3 replies

henry_kang • yesterday at 1:54 PM

Actually you can do way more things than that. We have optimized it to process 2TB of high def videos on a M5 MBP in under 24 hours, including everything such as speech understanding, face recog, LLM and VLM. Super fun.

➕ show 2 replies

andybak • yesterday at 1:26 PM

> draw a pretty good pelican on a bike.

You mean the famously hard task? The one picked because it stretches frontier models to their limits?

➕ show 3 replies

zozbot234 • yesterday at 3:32 PM

A phone makes a very crappy AI inference rig. It's battery powered and can't even really run at 100% utilization on an ongoing basis due to how challenging the thermals are.

➕ show 1 reply

AdeelKhamisa • yesterday at 1:36 PM

I was writing just about this last week for fun: AI + hardware team-up to build localized AI with specialized functions to your organization. Ex. Adoble Studio AI in an on premise Box, made by Apple and powered by something like Cohere with privacy:

https://www.notion.so/adeelkhamisa/Cohere-s-next-steps-to-be...

tootie • yesterday at 3:49 PM

I just did something exactly like this. I have a self-hosted personal dashboard and one of the APIs I'm reading gives slightly too verbose of an output. So I added a feature to summarize the text using Qwen 3.5 2B which happily runs on a CPU. I've never clocked the tokens per second because I only generate like 100 tokens an hour in a very narrow domain of knowledge and speed isn't critical.

alt Hacker News

Replies