logoalt Hacker News

ozimyesterday at 7:03 PM1 replyview on HN

I was expecting DGX Spark to run Gemma 31b Q4 much faster.

I was expecting it would run Q8 in 50 tok/s.

I guess that’s good I stopped thinking about buying it because I would be disappointed.


Replies

girvoyesterday at 10:22 PM

I love my Spark-alike, but they really aren't inference boxes IMO. They're experimentation boxes. A couple of 3080 20GB's for cheap from China, a 5090, an RTX Pro 6000 if you can swing the horrible cost: those are better choices IMO

That said, I'm still running Step 3.7 Flash at ~40tk/s decode, 1000tk/s+ prefill on mine and its both very capable and fast enough

I got Gemma 31b to run on this at ~22tk/s decode at FP8 using MTP