logoalt Hacker News

porphyrayesterday at 5:59 PM3 repliesview on HN

You can also run Qwen 3.6 27B dense model on DGX Spark with comparable performance [1][2] for about $4000 (Asus Ascent GX10 is $3999 at various retailers).

In theory you can also get 48GB of VRAM with, say, two 3090s, but it will take up a lot of space and generate a lot of heat compared to the Macbook Pro and GB10.

[1] https://x.com/MiaAI_lab/status/2070859135399182444

[2] https://github.com/MiaAI-Lab/Qwen3.6-27B-NVFP4-vLLM


Replies

Zetaphortoday at 3:22 AM

Alternatively you could run it on Strix Halo for $1,000 less, and while it may be slightly slower you won't have to deal with NVIDIA's shit on Linux and worrying about having to use their custom kernels or Ubuntu.

esperentyesterday at 6:02 PM

> 48GB of VRAM with, say, two 3090s

So like... $2000+ just for the used GPUs? Plus I assume it's considerably more effort to get it working.

show 1 reply
lee_arsyesterday at 9:18 PM

The tweet you link shows "Qwen 3.6 35b NVFP4 - 256k ctx, 110 tok/s", but I'm getting only half that, around 50 tok/sec, on a DGX Spark with Qwen3.6-35B-A3B-NVFP4 (via vLLM) plus speculative decode w/EAGLE3. I'd be ecstatic to see 110 tok/sec and I wish they had some more sourcing for the exact config, because it's double what I'm getting.

edit - after actually reading the tweets (had to use xcancel) and visiting the source git repo, switching to MTP for speculative decode makes things a hell of a lot faster, and the abliterated model plus dflash makes it even faster! I'm now seeing 70-90 tok/sec for most stuff. I like!

show 1 reply