logoalt Hacker News

EnPissantyesterday at 6:16 PM1 replyview on HN

I don't mean to be a jerk, but 2-bit quant, reducing experts from 10 to 4, who knows if the test is running long enough for the SSD to thermal throttle, and still only getting 5.5 tokens/s does not sound useful to me.


Replies

simonwyesterday at 7:26 PM

It's a lot more useful than being entirely unable to try out the model.

show 1 reply