Even low-VRAM cards are actually very useful for running the comparatively smaller dense layers in l...

zozbot234 • today at 5:33 PM • 0 replies • view on HN

Even low-VRAM cards are actually very useful for running the comparatively smaller dense layers in large local MoE models. This only requires transfering very small amounts of data across the PCIe bus (similar to pipeline parallelism) so it fits nicely around the existing bottlenecks on that hardware.

alt Hacker News