"That’s where EMO comes in. We show that EMO – a 1B-active, 14B-total-parameter (8-expert act...

marci • today at 6:56 AM • 0 replies • view on HN

"That’s where EMO comes in.

We show that EMO – a 1B-active, 14B-total-parameter (8-expert active, 128-expert total) MoE trained on 1 trillion tokens – supports selective expert use: for a given task or domain, we can use only a small subset of experts (just 12.5% of total experts) while retaining near full-model performance."

https://allenai.org/blog/emo

alt Hacker News