yes, but the difference between one model and one 4x larger is usually a lot more than that. It is...

djsjajah • today at 1:35 AM • 0 replies • view on HN

yes, but the difference between one model and one 4x larger is usually a lot more than that.

It is not a question of do a run Qwen 8b at bf16 or a quantized version. It more of a question of do I run Qwen 8b at full precision or do I run a quantized version of Qwen 27b.

You will find that you are usually better off with the larger model.

alt Hacker News