i mean 4bit quantized. i can roughly calculate vram for dense models by model size. but i don't...

yekanchi • yesterday at 8:59 AM • 2 replies • view on HN

i mean 4bit quantized. i can roughly calculate vram for dense models by model size. but i don't know how to do it for MOE models?

Replies

DiabloD3 • yesterday at 2:17 PM

Same calculation, basically. Any given ~30B model is going to use the same VRAM (assuming loading it all into VRAM, which MoEs do not need to do), is going to be the same size

EnPissant • yesterday at 9:12 AM

MoE models need just as much VRAM as dense models because every token may use a different set of experts. They just run faster.

➕ show 1 reply

alt Hacker News

Replies