32GB vram is more than enough for Qwen 3.5 35b You can just load the Q4_K_XL model like normal, an...

jychang • yesterday at 12:38 PM • 1 reply • view on HN

32GB vram is more than enough for Qwen 3.5 35b

You can just load the Q4_K_XL model like normal, and put all tensors on GPU without any -ot or --cpu-moe flags.

If you need a massive context for some reason where model+kv cache won't fit in 32gb, then use -ot to move the ffn moe experts for 1-2 layers into RAM. You'll get a speed hit (due to loading params from slower RAM instead of fast VRAM) but it'll work.

Replies

roxolotl • yesterday at 1:11 PM

Nice ok I’ll play with that. I’m mostly just learning what’s possible. Qwen 3.5 35b has been great without any customizations but it’s interesting to learn what the options are.

alt Hacker News

Replies