logoalt Hacker News

walrus01today at 1:40 AM3 repliesview on HN

I personally find any model smaller than something like Qwen 3.6 35B-A3B (8-bit quantization, about 49GB memory usage when loaded into llama.cpp) to be too "stupid" for reliable use.

I would much rather not run the model on my local laptop hardware and offload that to some system sitting under my desk in my home office, accessible via VPN, than take the risk of using an unreliable and flaky tool for the convenience of having it on the same hardware on my lap.

I pay very little attention to 8 billion or whatever (or even much smaller) models these days and I don't feel like I'm missing much.


Replies

satvikpendemtoday at 2:31 AM

Qwen 3.6 27B dense is much better than the 35B MoE model for coding, not sure if you've tried that yet.

show 2 replies
theanonymousonetoday at 6:35 AM

Have you seen the 8bit quantisation matter a lot? The "consensus" in r/LocalLlama is that up to 4 bits the loss is tolerable.

show 2 replies
thot_experimenttoday at 2:39 AM

q6 is fine for that qwen with ctx @ q8, and the dense models of that size are solid at q4 with q8 ctx