depending on what you're using the synthetic data for, it is sometimes called distillation. here is a robust example from some upenn students: https://datadreamer.dev/