logoalt Hacker News

zozbot234yesterday at 6:03 PM0 repliesview on HN

With sparse MoE it's worth running the experts in system RAM since that allows you to transparently use mmap and inactive experts can stay on disk. Of course that's also a slowdown unless you have enough RAM for the full set, but it lets you run much larger models on smaller systems.