> Reduce your expectations about speed and performance! Wildly understating this part. Even t...

paxys • last Wednesday at 9:59 PM • 21 replies • view on HN

> Reduce your expectations about speed and performance!

Wildly understating this part.

Even the best local models (ones you run on beefy 128GB+ RAM machines) get nowhere close to the sheer intelligence of Claude/Gemini/Codex. At worst these models will move you backwards and just increase the amount of work Claude has to do when your limits reset.

Replies

andai • yesterday at 1:12 AM

Yeah this is why I ended up getting Claude subscription in the first place.

I was using GLM on ZAI coding plan (jerry rigged Claude Code for $3/month), but finding myself asking Sonnet to rewrite 90% of the code GLM was giving me. At some point I was like "what the hell am I doing" and just switched.

To clarify, the code I was getting before mostly worked, it was just a lot less pleasant to look at and work with. Might be a matter of taste, but I found it had a big impact on my morale and productivity.

➕ show 5 replies

tracker1 • yesterday at 5:33 PM

For my relatively limited exposure, I'm not sure if I'd be able to tolerate it. I've found Claude/Opus to e pretty nice to work with... by contrast, I find Github Copilot to be the most annoying thing I've ever tried to work with.

Because of how the plugin works in VS code, on my third day of testing with Claude Code, I didn't click the Claude button and was accidentally working with CoPilot for about three hours of torture when I realized I wasn't in Claude Code. Will NEVER make that mistake again... I can only imagine anything I can run at any decent speed lcoally will be closer to the latter. I pretty quickly reach a "I can do this faster/better myself" point... even a few times with Claude/Opus, so my patience isn't always the greatest.

That said, I love how easy it is to build up a scaffold of a boilerplate app for the sole reason to test a single library/function in isolation from a larger application. In 5-10 minutes, I've got enough test harness around what I'm trying to work on/solve that it lets me focus on the problem at hand, while not worrying about doing this on the integrated larger project.

I've still got some thinking and experimenting to do with improving some of my workflows... but I will say that AI Assist has definitely been a multiplier in terms of my own productivity. At this point, there's literally no excuse not to have actual code running experiments when learning something new, connecting to something you haven't used before... etc. in terms of working on a solution to a problem. Assuming you have at least a rudimentary understanding of what you're actually trying to accomplish in the piece you are working on. I still don't have enough trust to use AI to build a larger system, or for that matter to truly just vibe code anything.

zozbot234 • last Wednesday at 10:05 PM

The best open models such as Kimi 2.5 are about as smart today as the big proprietary models were one year ago. That's not "nothing" and is plenty good enough for everyday work.

➕ show 6 replies

EagnaIonat • yesterday at 7:54 AM

The secret is to not run out of quota.

Instead have Claude know when to offload work to local models and what model is best suited for the job. It will shape the prompt for the model. Then have Claude review the results. Massive reduction in costs.

btw, at least on Macbooks you can run good models with just M1 32GB of memory.

➕ show 2 replies

bityard • last Wednesday at 11:40 PM

Correct, a rack full of datacenter equipment is not going to compete with anything that fits on your desk or lap. Well spotted.

But as a counterpoint: there are whole communities of people in this space who get significant value from models they run locally. I am one of them.

➕ show 3 replies

anon373839 • yesterday at 2:29 AM

It's true that open models are a half-step behind the frontier, but I can't say that I've seen "sheer intelligence" from the models you mentioned. Just a couple of days ago Gemini 3 Pro was happily writing naive graph traversal code without any cycle detection or safety measures. If nothing else, I would have thought these models could nail basic algorithms by now?

➕ show 1 reply

altern8 • yesterday at 4:05 PM

I was wondering the same thing, e.g. if it takes tens or hundreds of millions of dollars to train and keep a model up-to-date, how can an open source one compete with that?

➕ show 1 reply

majormajor • yesterday at 5:54 AM

The amount of "prompting" stuff (meta-prompting?) the "thinking" models do behind the scenes even beyond what the harnesses do is massive; you could of course rebuild it locally, but it's gonna make it just that much slower.

I expect it'll come along but I'm not gonna spend the $$$$ necessary to try to DIY it just yet.

acchow • yesterday at 6:19 AM

I agree. You could spin for 100 hours on a sub-par model or get it done in 10 minutes with a frontier model

seanmcdirmid • yesterday at 1:27 AM

> (ones you run on beefy 128GB+ RAM machines)

PC or Mac? A PC, ya, no way, not without beefy GPUs with lots of VRAM. A mac? Depends on the CPU, an M3 Ultra with 128GB of unified RAM is going to get closer, at least. You can have decent experiences with a Max CPU + 64GB of unified RAM (well, that's my setup at least).

➕ show 1 reply

richstokes • yesterday at 12:25 AM

This. It's a false economy if you value your time even slightly, pay for the extra tokens and use the premium models.

mycall • yesterday at 1:22 AM

There is tons of improvements in the near future. Even Claude Code developer said he aimed at delivering a product that was built for future models he betted would improve enough to fulfill his assumptions. Parallel vLLM MoE local LLMs on a Strix Halo 128GB has some life in it yet.

0xbadcafebee • yesterday at 12:52 AM

The best local models are literally right behind Claude/Gemini/Codex. Check the benchmarks.

That said, Claude Code is designed to work with Anthropic's models. Agents have a buttload of custom work going on in the background to massage specific models to do things well.

➕ show 1 reply

dheera • last Wednesday at 10:54 PM

Maybe add to the Claude system prompt that it should work efficiently or else its unfinished work will be handed off to to a stupider junior LLM when its limits run out, and it will be forced to deal with the fallout the next day.

That might incentivize it to perform slightly better from the get go.

➕ show 1 reply

cat_plus_plus • yesterday at 3:30 PM

Depends on whether you want a programmer or a therapist. Given clear description of class structure and key algorithms, Qwen3-Code is way more likely to do exactly what is being asked than any Gemini model. If you want to turn a vague idea into a design, yeah cloud bot is better. Let's not forget that cloud bots have web search, if you hook up a local model to GPT Researcher or Onyx frontend, you will see reasonable performance, although open ended research is where cloud model scale does pay off. Provided it actually bothers to search rather than hallucinating to save backend costs. Also local uncensored model is way better at doing proper security analysis of your app / network.

bicx • last Wednesday at 10:53 PM

Exactly. The comparison benchmark in the local LLM community is often GPT _3.5_, and most home machines can’t achieve that level.

amelius • yesterday at 11:17 AM

And at best?

mlrtime • yesterday at 3:11 AM

The local ones yeah...

I have claude pro $20/mo and sometimes run out. I just set ANTHROPIC_BASE_URL to a localllm API endpoint that connects to a cheaper Openai model. I can continue with smaller tasks with no problem. This has been done for a long time.

DANmode • yesterday at 12:11 AM

and you really should be measuring based on the worst-case scenario for tools like this.

nik282000 • last Wednesday at 10:38 PM

> intelligence

Whether it's a giant corporate model or something you run locally, there is no intelligence there. It's still just a lying engine. It will tell you the string of tokens most likely to come after your prompt based on training data that was stolen and used against the wishes of its original creators.

alt Hacker News

Replies