I have spent a HUGE amount of time the last two years experimenting with local models. A few lesso...

mark_l_watson • today at 6:20 PM • 4 replies • view on HN

I have spent a HUGE amount of time the last two years experimenting with local models.

A few lessons learned:

1. small models like the new qwen3.5:9b can be fantastic for local tool use, information extraction, and many other embedded applications.

2. For coding tools, just use Google Antigravity and gemini-cli, or, Anthropic Claude, or...

Now to be clear, I have spent perhaps 100 hours in the last year configuring local models for coding using Emacs, Claude Code (configured for local), etc. However, I am retired and this time was a lot of fun for me: lot's of efforts trying to maximize local only results. I don't recommend it for others.

I do recommend getting very good at using embedded local models in small practical applications. Sweet spot.

Replies

johnmaguire • today at 6:54 PM

I'd love to know how you fit smaller models into your workflow. I have an M4 Macbook Pro w/ 128GB RAM and while I have toyed with some models via ollama, I haven't really found a nice workflow for them yet.

➕ show 4 replies

manmal • today at 6:50 PM

What about running e.g. Qwen3.5 128B on a rented RTX Pro 6000?

➕ show 1 reply

nine_k • today at 6:29 PM

What kind of hardware did you use? I suppose that a 8GB gaming GPU and a Mac Pro with 512 GB unified RAM give quite different results, both formally being local.

➕ show 1 reply

kylehotchkiss • today at 6:53 PM

I've been really interested in the difference between 3.5 9b and 14b for information extraction. Is there a discernible difference in quality of capability?

alt Hacker News

Replies