When it comes to MoE, to me, I remember Mixtral model that showed the viability of MoE for the first time. I was impressed by their technical report. To be clear, MoE idea was already out there, if I am not mistaken. If they have pushed Mixtral model family further, who knows they might have achieved the reputation of what the current Qwen family has. A missed opportunity.