Just get the bigger models to figure out the architecture required for hot-swappable sub-experts without loss of performance!
Got all those tokens, isn’t that the point of auto research and friends??
(Only sort of joking).