You are greatly underestimating the hardware requirements for productive local LLMs. Research consis...

root_axis • yesterday at 1:47 AM • 8 replies • view on HN

You are greatly underestimating the hardware requirements for productive local LLMs. Research consistently shows that parameter count sets the practical ceiling for a model's reliability. Quantized models with double digit param counts will never be reliable enough to achieve results in the realm of something like Opus 4.6.

Replies

thot_experiment • yesterday at 5:47 AM

Flat wrong. Q6 Gemma 31b feels a lot like opus 4.5 to me when run in a harness so it can retrieve information and ground itself. The gap is not that big for a lot of usecases. Qwen MoE is fast as fuck locally for things that are oneshottable. I have subscriptions to all the major providers right now and since Gemma 4 and Qwen 3.6 came out I haven't hit limits a single time. I'm actually super surprised by the number of things I try with Gemma 4 with the intent of seeing how it fails and then having Claude do it only to come away with something perfectly usable from the local model.

➕ show 5 replies

segmondy • yesterday at 2:05 AM

Jokes on you. We are already running Deepseekv4Flash, Mimo2.5, MiniMax2.7, Qwen3-397B locally in very affordable hardware. These models are in the real of Opus4.6. For those of us a bit crazy, we are running KimiK2.6, GLM5.1 and more ...

➕ show 2 replies

wincy • yesterday at 1:55 AM

Won’t these H100s drop in price in a few years? With the data center build out surely these will become 1/10th the price and you’ll be able to set up a local LLM as good as opus 4.7. Even if the frontier model become more advanced, and memory hungry, you could use the same power usage as your oven to run a current day frontier model as needed? If I could drop $10,000 to have an effectively permanent opus 4.7 subscription today, I would.

➕ show 2 replies

CuriouslyC • yesterday at 2:03 AM

Parameter size gets you world knowledge and better persistence of behavior as context grows. Both of those things can be engineered around to a large degree, and the latest Qwen models show that small models can be quite smart in narrow domains and short time windows.

➕ show 1 reply

stubish • yesterday at 5:37 AM

It depends on what you mean for 'productive'. Article mainly seems to be about targeting consumer level hardware, such as the Neural Processing Unit you need for a 'Copilot PC'. Windows Recall is (was?) one such local AI application. If Microsoft get their way and my next PC has one, I look forward to using it for 'productive' purposes such as playing games, handling natural language stuff and leaving my GPU free for GPUing.

byzantinegene • yesterday at 1:50 AM

i would argue we don't need anything near Opus to be productive. Sonnet is plenty productive enough

➕ show 2 replies

ActorNightly • yesterday at 4:39 PM

Yes and no.

The best analogy is the difference between having N senior level engineers working for you, versus having N entry level engineers.

With frontier cloud models, you can give a single invocation one task, and it can figure everything out.

With local models, you have to manage the inputs and outputs quite a bit more, but you can achieve similar results for tasks you set up harnesses for. They are not as a good at finding the right answer internally from their own weights, but they are very capable of ingesting context and reformatting text - for example, for debugging, local models can debug issues quite well if you give them the error and documentation for a particular feature you are trying to implement.

josteink • yesterday at 7:07 AM

> You are greatly underestimating the current hardware requirements for productive local LLMs.

Fixed that for you. Right now most models produced are based on floating point maths and probabilities, which is "expensive" to do math on.

Microsoft has researched 1-bit LLMs which can run much more efficiently, and on much cheaper hardware[1].

If this research is reproducable and reusable outside their research models, this means the cost of running self-hosted LLMs will be reduced by an order of magnitude once this hits mainstream.

[1] https://github.com/microsoft/BitNet

alt Hacker News

Replies