logoalt Hacker News

DiabloD3yesterday at 8:51 AM1 replyview on HN

Thats not a meaningful question. Models can be quantized to fit into much smaller memory requirements, and not all MoE layers (in MoE models) have to be offloaded to VRAM to maintain performance.


Replies

yekanchiyesterday at 8:59 AM

i mean 4bit quantized. i can roughly calculate vram for dense models by model size. but i don't know how to do it for MOE models?

show 2 replies