logoalt Hacker News

manytimesawayyesterday at 10:32 AM1 replyview on HN

Thanks for the quote, I couldn't find anything online.

Although it seems to me that the comparison is somewhat fragile : it was not possible to develop GNU anywhere else, whereas we could completely build local models from scratch nowadays, unless I'm mistaken.


Replies

nullcyesterday at 12:28 PM

Small models were originally built from distilling, using synthetic training materials, and filtering training material with much larger models. There is a bit of a bootstrapping problem where to build a good LLM you need a working LLM and if you don't have one the costs are absolutely eye watering.

One observation is that the LLM is a next token predictor but if you train it on the internet/textbooks/etc you get a predictor of that--- but that isn't the behavior we actually want. None of these sources tend to contain "Solve this problem for me. OK, here is the solution:".

It wasn't physically impossible to start GNU the other way around, by bashing machine code into a system until you had a working operating system. But doing so would have been a lot less reasonable-- much more expensive, making progress much less quickly, etc.