Based on what works elsewhere in deep learning, I see no reason why you couldn't train once with a randomized number of experts, then set that number during inference based on your desired compute-accuracy tradeoff. I would expect that this has been done in the literature already.