logoalt Hacker News

Saristoday at 12:38 PM0 repliesview on HN

Qwen3.6-35b-a3b at 64k context runs quite well on my 12GB VRAM GPU with MoE partially offloaded to CPU. It does use a good chunk of system RAM too, but I get about 40-50 tok/s.