#2 is not that surprising from first principles if the way you made the bigger model was by feeding it poorer quality training data because it’s the only way you can get enough