I don't mean to be a jerk, but 2-bit quant, reducing experts from 10 to 4, who knows if the test is running long enough for the SSD to thermal throttle, and still only getting 5.5 tokens/s does not sound useful to me.
It's a lot more useful than being entirely unable to try out the model.
It's a lot more useful than being entirely unable to try out the model.