Thanks for the quote, I couldn't find anything online.
Although it seems to me that the comparison is somewhat fragile : it was not possible to develop GNU anywhere else, whereas we could completely build local models from scratch nowadays, unless I'm mistaken.
Small models were originally built from distilling, using synthetic training materials, and filtering training material with much larger models. There is a bit of a bootstrapping problem where to build a good LLM you need a working LLM and if you don't have one the costs are absolutely eye watering.
One observation is that the LLM is a next token predictor but if you train it on the internet/textbooks/etc you get a predictor of that--- but that isn't the behavior we actually want. None of these sources tend to contain "Solve this problem for me. OK, here is the solution:".
It wasn't physically impossible to start GNU the other way around, by bashing machine code into a system until you had a working operating system. But doing so would have been a lot less reasonable-- much more expensive, making progress much less quickly, etc.