What reasonable comparable model can be run locally on say 16GB of video memory compared to Opus 4.6...

nodesocket • yesterday at 6:12 AM • 4 replies • view on HN

What reasonable comparable model can be run locally on say 16GB of video memory compared to Opus 4.6? As far as I know Kimi (while good) needs serious GPUs GTX 6000 Ada minimum. More likely H100 or H200.

Replies

berkes • yesterday at 1:35 PM

Devstral¹ has very good models that can be run locally.

They are in the top of open models, and surpass some closed models.

I've been using devstral, codestral and Le Chat exclusively for three months now. All from misteals hosted versions. Agentic, as completion and for day-to-day stuff. It's not perfect, but neither is any other model or product, so good enough for me. Less anecdotal are the various benchmarks that put them surprisingly high in the rankings

¹https://mistral.ai/news/devstral

mixermachine • yesterday at 7:56 AM

Nothing will come close to Opus 4.6 here. You will be able to fit a destilled 20B to 30B model on your GPU. Gpt-oss-20B is quite good in my testing locally on a Macbook Pro M2 Pro 32GB.

The bigger downside, when you compare it to Opus or any other hosted model, is the limited context. You might be able to achieve around 30k. Hosted models often have 128k or more. Opus 4.6 has 200k as its standard and 1M in api beta mode.

➕ show 1 reply

lodovic • yesterday at 7:12 AM

I made something similar to this project, and tested it against a few 3B and 8B models (Qwen and Ministral, both the instruction and the reasoning variants). I was pleasantly surprised by how fast and accurate these small models have become. I can ask it things like "check out this repo and build it", and with a Ralph strategy eventually it will succeed, despite the small context size.

PeterStuer • yesterday at 8:34 AM

Nothing close to Opus is available in open weights. That said, do all your tasks need the power of Opus?

➕ show 1 reply

alt Hacker News

Replies