Not training on data is a con for me not a pro. The reason Claude is so good is RL training from users' chat histories and use cases. The era of pure public data training is over, as everyone has access to this data yet only a few are frontier models.