>Chinese models use distillation but I don’t see them training models from scratch
Maybe because they don't have to. If someone is doing the heavy work and they can take output of that, it's a win for them.