I assume these are just output layers that are trained on the hidden state from the larger model - that's how MTP works. It's not a separate drafting model.