logoalt Hacker News

0x457yesterday at 5:04 PM0 repliesview on HN

What MoE has to do with it? Aside from Flash-MoE that supports exactly one model and only on macOs - you still need to load entire model into memory. You also don't know what experts going to be activated, so it's not like you can predict which needs to be loaded.