> It seems like one needs a big machine farm and a vast corpus of training data with a lot of manual curation to get started creating a competitive LLM, plus whatever technical expertise that I don't even know about. The stuff that makes LLMs exist now and not earlier.
"big machine farm" reminds me of folding@home, which needed the same and got it.
"manual curation" is what Wikipedia did, as well as the free software community.
"technical expertise" is present in the free software world too. It is sparse since it is sparse in the world as a whole, but it exists.
"no Linus Torvalds figure" might be the main problem ATM.
I also thought of these after writing my comment. The main problems that I see with these solutions are:
- Training seems to need a lot of data available at the same time, which is difficult to handle on commodity hardware.
- Manual curation can be a mind-numbing task, it might need to be gamified somehow.
There is a chance that the curation could be higher quality than the current corporate stuff. Pretty sure that it's not an intrinsic property of LLMs to write like TED talks.