Being grafted onto the main model reduces layer duplication that you’d otherwise have: at least for Step and Qwen 3.6