No. There's no "answer" really.
They use self-distillation to shift the output distribution of the model towards that of the same model, but running with different temperature/truncation settings in sampling.
This effectively "folds" the logit tail truncation behavior into the model itself.
Not entirely unlike a few "model controlled sampling settings" things I've seen in what it does, but different in execution.