logoalt Hacker News

naaskingtoday at 3:55 AM0 repliesview on HN

MoE isn't inherently better, but I do think it's still an under explored space. When your sparse model can do 5 runs on the same prompt in the same time as a dense model takes to generate one, there opens up all sorts of interesting possibilities.