i mean 4bit quantized. i can roughly calculate vram for dense models by model size. but i don't know how to do it for MOE models?
MoE models need just as much VRAM as dense models because every token may use a different set of experts. They just run faster.
Same calculation, basically. Any given ~30B model is going to use the same VRAM (assuming loading it all into VRAM, which MoEs do not need to do), is going to be the same size