logoalt Hacker News

munk-ayesterday at 10:38 PM1 replyview on HN

It's absolutely prohibited to copy and redistribute for commercial purposes materials that you're unlicensed to do so with. This isn't an issue when it comes to the copy-left scenario (though it may potentially enforce transitive licensing requirements on the copier that LLM runners don't want to follow) but it is a huge issue that has come up with LLM training.

LLM training involves ingesting works (in a potentially transformative process) and partially reproduce them - that's a generally restricted action when it comes to licensing.


Replies

crazygringoyesterday at 10:56 PM

> It's absolutely prohibited to copy and redistribute for commercial purposes materials that you're unlicensed to do so with.

Sure, but that's not what LLM's generally do, and it's certainly not what they're intended to do.

The LLM companies, and many other people, argue that training falls under fair use. One element of fair use is whether the purpose/character is sufficiently transformative, and transforming texts into weights without even a remote 1-1 correspondence is the transformation.

And this is why LLM companies ensure that partial reproduction doesn't happen during LLM usage, using a kind of copyrighted-text filter as a last check in case anything would unintentionally get through. (And it doesn't even tend to occur in the first place, except when the LLM is trained on a bunch of copies of the same text.)

show 2 replies