I don't mean to be a jerk, but 2-bit quant, reducing experts from 10 to 4, who knows if the tes...

EnPissant • yesterday at 6:16 PM • 1 reply • view on HN

I don't mean to be a jerk, but 2-bit quant, reducing experts from 10 to 4, who knows if the test is running long enough for the SSD to thermal throttle, and still only getting 5.5 tokens/s does not sound useful to me.

Replies

simonw • yesterday at 7:26 PM

It's a lot more useful than being entirely unable to try out the model.

➕ show 1 reply

alt Hacker News

Replies