Luckily local AI is becoming more feasible every day.
Maybe for folks who are deep into this, but it’s not exactly accessible. I tried reading up on it a couple of months ago, but parsing through what hardware I needed, the model and how to configure it (model size vs quantization), how I’d get access to the hardware (which for decent results in coding, new hardware runs $4k-$10k last I checked)—it had a non trivial barrier of entry. I was trying to do this over a long weekend and ran out of time. I’ll have to look into it again because having the local option would be great.
Edit: the replies to my comment are great examples of what I’m talking about when I say it’s hard to determine what hardware I’d need :).
I've been using local AI via LM Studio ever since I canceled my Claude subscription. It's obviously slower than Claude on my M1 Studio[†], but like someone else said, I use AI more like a copilot than an autopilot. I'm pretty enthused that I can give it a small task and let it churn through it for a few minutes, while I work on something alongside – all for free with no goddamned arbitrary limits.
[†] The latest Qwen 3.6 whatever has been a noticeable improvement, and I'm not even at the point where I tweak settings like sampling, temperature, etc. No idea what that stuff does, I just use the staff picks in LM Studio and customize the system prompts.
Feasibility on commodity hardware would be the true watermark. Running high end computers is the only way to get decent results at the moment, but if we can run inference on CPUs, NPUs, and GPUs on everyday hardware, the moat should disappear.
Indeed, I feel like we are in the early computer equivalent phase of AI, where giant expensive hardware is still required for frontier models. In 5 years I bet there will be fully open models we'll be able to run on a few $1000 of consumer hardware with equivalent performance to opus 4.7/4.6.
Sure, but local AI is still a black box. They can be influenced by training data selection, poisoning, hidden system prompts, etc. That recent Wordpress supply chain hack goes to show that the rug can still be pulled even if the software is FOSS.
I love how it's just a tacit understanding that these companies' entire MO is to carve out a territory, get everyone hooked on the good stuff and then jack up the price when they're addicted and captured -- literally the business plan of crack dealers, and it's just business as usual in the tech industry.
Not really. The hardware requirements remain indefinitely out of reach.
Yes, it's possible to run tiny quantized models, but you're working with extremely small context windows and tons of hallucinations. It's fun to play with them, but they're not at all practical.
It feels more and more like OpenAI/Anthoropic aren't the future but Qwen, Kimi, or Deepseek are. You can run them locally, but that isn't really the point, it is about democratization of service providers. You can run any of them on a dozen providers with different trade-offs/offerings OR locally.
They won't ever be SOTA due to money, but "last year's SOTA" when it costs 1/4 or less, may be good enough. More quantity, more flexibility, at lower edge quality. It can make sense. A 7% dumber agent TEAM Vs. a single objectively superior super-agent.
That's the most exciting thing going on in that space. New workflows opening up not due to intelligence improvements but cost improvements for "good enough" intelligence.