> It can read the code? Historical discussions around it? Commit histories?
And if everyone bunkers up and all that open content dries up starting in 2026, let's say, what happens?
Well that historical content and code still exists right? Are you just saying “what if we’re in a world of walled gardens now that OSS dies because people don’t want their work stolen” in which case: these companies will get data and they don’t need OSS anymore. It’s already webcrawled or licensed or commissioned, they pay people to generate novel traces when they need it or at the very least sets of prompts and tests for verification. Then synthetic data gets added to the training set, the ones that are verified.
It won't happen, for two reasons. One is that great deal of open-source software and hobbyist knowledge sharing has never been driven by financial reward anyway and people will continue to do it anyway. Finer grained controls over opt-outs would be great (the equivalent of a search engine 'nofollow' would be great and will hopefully come with time).
Many kinds of technology faced this kind of tragedy of the commons argument in the past and it never bears out. Printing presses copied manuscripts, search engines copied and indexed web pages, open-source software was incorporated into commercial products, Wikipedia repackaged knowledge produced elsewhere.
In almost all cases the total amount of creation increases because the technology lowered costs, expanded audiences, or created new forms of value. The speed of creation of new 'View Source' outpaces the number of people pulling back.