This would be a sort of convergence? They were both right in part (Chomsky that there was structure there, Norvig that it could be sussed out using brute force statistics). As is often the case, when two smart people who have thought a lot about something complicated disagree, the truth comes out when their unstated assumptions are finally exposed to the light.
In this case, Chomsky's LAD almost certainly relies on Baldwin-effect structure to get around the paucity of stimuli, and the LLMs are just getting to "the same place" through sheer masses of data.
This would be a sort of convergence? They were both right in part (Chomsky that there was structure there, Norvig that it could be sussed out using brute force statistics). As is often the case, when two smart people who have thought a lot about something complicated disagree, the truth comes out when their unstated assumptions are finally exposed to the light.
In this case, Chomsky's LAD almost certainly relies on Baldwin-effect structure to get around the paucity of stimuli, and the LLMs are just getting to "the same place" through sheer masses of data.