logoalt Hacker News

gspryesterday at 9:43 PM2 repliesview on HN

> Training on copyleft licensed code is not a license violation. Any more than a person reading it is.

Some might hold that we've granted persons certain exemptions, on account of them being persons. We do not have to grant machines the same.

> In copyright terms, it's such an extreme transformative use that copyright no longer applies.

Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim? Sure, it can also produce extremely transformed versions, but is that really relevant if it holds within it enough information for a (near-)verbatim reproduction?


Replies

NewsaHackOyesterday at 10:25 PM

>Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim? Sure, it can also produce extremely transformed versions, but is that really relevant if it holds within it enough information for a (near-)verbatim reproduction?

I feel as though, from an information-theoretic standpoint, it can't be possible that an LLM (which is almost certainly <1 TB big) can contain any substantial verbatim portion of its training corpus, which includes audio, images, and videos.

crazygringoyesterday at 10:17 PM

> We do not have to grant machines the same.

No we don't have to, but so far we do, because that's the most legally consistent. If you want to change that, you're going to need to pass new laws that may wind up radically redefining intellectual property.

> Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim?

Of course it has, if the transformation is extreme, as it appears to be here. If I memorize the lyrics to a bunch of love songs, and then write my own love song where every line is new, nobody's going to successfully sue me just because I can sing a bunch of other songs from memory.

Also, it's not even remotely clear that the LLM can produce the training data near-verbatim. Generally it can't, unless it's something that it's been trained on with high levels of repetition.

show 1 reply