> It's something in instruction-tuning that's doing it.
Isn't the instruction tuning done with huge amounts of synthetic data? I wonder if the lack of diversity comes from llm generated data used for instruction tuning.