"It is built end-to-end by Microsoft using clean and appropriately licensed data."
Well still no list nor publication of the training data.