> I think there's no meaningful case by the letter of the law that use of training data that...

tpmoney • yesterday at 1:46 AM • 4 replies • view on HN

> I think there's no meaningful case by the letter of the law that use of training data that include GPL-licensed software in models that comprise the core component of modern LLMs doesn't obligate every producer of such models to make both the models and the software stack supporting them available under the same terms.

Why do you think "fair use" doesn't apply in this case? The prior Bartz vs Anthropic ruling laid out pretty clearly how training an AI model falls within the realm of fair use. Authors Guild vs Google and Authors Guild vs HathiTrust were both decided much earlier and both found that digitizing copyrighted works for the sake of making them searchable is sufficiently transformative to meet the standards of fair use. So what is it about GPL licensed software that you feel would make AI training on it not subject to the same copyright and fair use considerations that apply to books?

Replies

shakna • yesterday at 6:38 AM

Bartz v Anthropic explicitly held ruling on fair use. It is not precedent, here.

ronsor • yesterday at 2:19 AM

> So what is it about GPL licensed software that you feel would make AI training on it not subject to the same copyright and fair use considerations that apply to books?

The poster doesn't like it, so it's different. Most of the "legal analysis" and "foregone conclusions" in these types of discussions are vibes dressed up as objective declarations.

➕ show 1 reply

jerf • yesterday at 1:31 PM

You sound like you're citing the general Internet understanding of "fair use", which seems to amount to "I can do whatever I like to any copyrighted content as long as maybe I mutilate it enough and shout 'FAIR USE!' loudly enough."

On the real measures of "fair use", at least in the US: https://fairuse.stanford.edu/overview/fair-use/four-factors/ I would contend that it absolutely face plants on all four measures. The purpose is absolutely in the form of a "replacement" for the original, the nature is something that has been abundantly proved many times over in court as being something copyrightable as a creative expression (with limited exceptions for particular bits of code that are informational), the "amount and substantiality" of the portions used is "all of it", and the effect of use is devastating to the market value of the original.

You may disagree. A long comment thread may ensue. However, all I really need for my point here is simply that it is far, far from obvious that waving the term "FAIR USE!" around is a sufficient defense. It would be a lengthy court case, not a slam-dunk "well duh it's obvious this is fair use". The real "fair use" and not the internet's "FAIR USE!" bear little resemblance to each other.

A sibling comment mentions Bartz v. Anthropic. Looking more at the details of the case I don't think it's obvious how to apply it, other than as a proof that just because an AI company acquired some material in "some manner" doesn't mean they can just do whatever with it. The case ruled they still had to buy a copy. I can easily make a case that "buying a copy" in the case of a GPL-2 codebase is "agreeing to the license" and that such an agreement could easily say "anything trained on this must also be released as GPL-2". It's a somewhat lengthy road to travel, where each step could result in a failure, but the same can be said for the road to "just because I can lay my hands on it means I can feed it to my AI and 100% own the result" and that has already had a step fail.

➕ show 1 reply

advael • yesterday at 5:33 AM

Broadly speaking, GPL is a license that has specific provisions about creating derivative software from the licensed work, and just saying "fair use" doesn't exempt you from those provisions. More specifically, an advertised use case (in fact, arguably the main one at this stage) of the most popular closed models as they're currently being used is to produce code, some of which is going to be GPL licensed. As such, the code used is part of the functionality of the program. The fact that this program was produced from the source code used by a machine learning algorithm rather than some other method doesn't change this fundamental fact.

The current supreme court may think that machine learning is some sort of magic exception, but they also seem to believe whatever oligarchs will bribe them to believe. Again, I doubt the law will be enforced as written, but that has more to do with corruption than any meaningful legal theory. Arguments against this claim seem to ignore that courts have already ruled these systems to not have intellectual property rights of their own, and the argument for fair use seems to rely pretty heavily on some handwavey anthropomorphization of the models.

➕ show 1 reply

alt Hacker News

Replies