logoalt Hacker News

cphooveryesterday at 5:02 PM5 repliesview on HN

5-10% accuracy is like the difference between a usable model, and unusable model.


Replies

samwhoyesterday at 5:22 PM

Definitely could be, but in the time I spent talking to the 4-bit models in comparison to the 16-bit original it seemed surprisingly capable still. I do recommend benchmarking quantized models at the specific tasks you care about.

djsjajahtoday at 1:35 AM

yes, but the difference between one model and one 4x larger is usually a lot more than that.

It is not a question of do a run Qwen 8b at bf16 or a quantized version. It more of a question of do I run Qwen 8b at full precision or do I run a quantized version of Qwen 27b.

You will find that you are usually better off with the larger model.

ameliusyesterday at 7:36 PM

Yes I was wondering why they mentioned those numbers without mentioning their practical significance.

hrmtst93837today at 12:35 PM

[dead]

hrmtst93837today at 11:08 AM

[dead]