Thats not a meaningful question. Models can be quantized to fit into much smaller memory requirement...

DiabloD3 • yesterday at 8:51 AM • 1 reply • view on HN

Thats not a meaningful question. Models can be quantized to fit into much smaller memory requirements, and not all MoE layers (in MoE models) have to be offloaded to VRAM to maintain performance.

Replies

yekanchi • yesterday at 8:59 AM

i mean 4bit quantized. i can roughly calculate vram for dense models by model size. but i don't know how to do it for MOE models?

➕ show 2 replies

alt Hacker News

Replies