MoE isn't inherently better, but I do think it's still an under explored space. When your ...

naasking • today at 3:55 AM • 0 replies • view on HN

MoE isn't inherently better, but I do think it's still an under explored space. When your sparse model can do 5 runs on the same prompt in the same time as a dense model takes to generate one, there opens up all sorts of interesting possibilities.

alt Hacker News