logoalt Hacker News

munk-ayesterday at 8:56 PM2 repliesview on HN

I think the missing thing here is that the license violation already happened. Most of the big models trained on data in a manner that violated terms of service. We'll need a court case but I think it's extremely reasonable to consider any model trained on GPL code to be infected with open licensing requirements.


Replies

crazygringoyesterday at 9:12 PM

You might wish that were true, but there are very strong arguments it's not. Training on copyleft licensed code is not a license violation. Any more than a person reading it is. In copyright terms, it's such an extreme transformative use that copyright no longer applies. It's fair use.

But agreed that we're waiting for a court case to confirm that. Although really, the main questions for any court cases are not going to be around the principle of fair use itself or whether training is transformative enough (it obviously is), but rather on the specifics:

1) Was any copyrighted material acquired legally (not applicable here), and

2) Is the LLM always providing a unique expression (e.g. not regurgitating books or libraries verbatim)

And in this particular case, they confirmed that the new implementation is 98.7% unique.

show 4 replies
NewsaHackOyesterday at 9:09 PM

I agree there has to be a court case about it. I think the current argument, however, is that it is transformative, and therefore falls under fair use.

show 2 replies