I personally find any model smaller than something like Qwen 3.6 35B-A3B (8-bit quantization, about ...

walrus01 • today at 1:40 AM • 3 replies • view on HN

I personally find any model smaller than something like Qwen 3.6 35B-A3B (8-bit quantization, about 49GB memory usage when loaded into llama.cpp) to be too "stupid" for reliable use.

I would much rather not run the model on my local laptop hardware and offload that to some system sitting under my desk in my home office, accessible via VPN, than take the risk of using an unreliable and flaky tool for the convenience of having it on the same hardware on my lap.

I pay very little attention to 8 billion or whatever (or even much smaller) models these days and I don't feel like I'm missing much.

Replies

satvikpendem • today at 2:31 AM

Qwen 3.6 27B dense is much better than the 35B MoE model for coding, not sure if you've tried that yet.

➕ show 2 replies

theanonymousone • today at 6:35 AM

Have you seen the 8bit quantisation matter a lot? The "consensus" in r/LocalLlama is that up to 4 bits the loss is tolerable.

➕ show 2 replies

thot_experiment • today at 2:39 AM

q6 is fine for that qwen with ctx @ q8, and the dense models of that size are solid at q4 with q8 ctx

alt Hacker News

Replies