logoalt Hacker News

yekanchiyesterday at 8:59 AM2 repliesview on HN

i mean 4bit quantized. i can roughly calculate vram for dense models by model size. but i don't know how to do it for MOE models?


Replies

DiabloD3yesterday at 2:17 PM

Same calculation, basically. Any given ~30B model is going to use the same VRAM (assuming loading it all into VRAM, which MoEs do not need to do), is going to be the same size

EnPissantyesterday at 9:12 AM

MoE models need just as much VRAM as dense models because every token may use a different set of experts. They just run faster.

show 1 reply