It's on the page:
Precision Quantization Tag File Size
1-bit UD-IQ1_M 10 GB
2-bit UD-IQ2_XXS 10.8 GB
UD-Q2_K_XL 12.3 GB
3-bit UD-IQ3_XXS 13.2 GB
UD-Q3_K_XL 16.8 GB
4-bit UD-IQ4_XS 17.7 GB
UD-Q4_K_XL 22.4 GB
5-bit UD-Q5_K_XL 26.6 GB
16-bit BF16 69.4 GBI really want to know what does M, K, XL XS mean in this context and how to choose.
I searched all unsloth doc and there seems no explaination at all.
Thanks! I'd scanned the main content but I'd been blind to the sidebar on the far right.
"16-bit BF16 69.4 GB"
Is that (BF16) a 16-bit float?
Additional VRAM is needed for context.
This model is a MoE model with only 3B active parameters per expert which works well with partial CPU offload. So in practice you can run the -A(N)B models on systems that have a little less VRAM than you need. The more you offload to the CPU the slower it becomes though.