I'm sorry, but you got the terminology exactly backwards. Training on the answer is called supervised fine-tuning.
Just for the sake of clarity:
0. Full distillation uses logits of the teacher model - that's much more information than the text itself. This is a kind of distillation used inside labs, but one can't distill Claude this way as logits are not available via API.
1. Supervised fine-tuning on synthetic data might be called blackbox distillation. I guess that's what you meant in your case (1).
2. Reinforcement learning (like RLAIF) uses least amount of information from the teacher, i.e. only few bits per task.