> It's absolutely prohibited to copy and redistribute for commercial purposes materials that you're unlicensed to do so with.
Sure, but that's not what LLM's generally do, and it's certainly not what they're intended to do.
The LLM companies, and many other people, argue that training falls under fair use. One element of fair use is whether the purpose/character is sufficiently transformative, and transforming texts into weights without even a remote 1-1 correspondence is the transformation.
And this is why LLM companies ensure that partial reproduction doesn't happen during LLM usage, using a kind of copyrighted-text filter as a last check in case anything would unintentionally get through. (And it doesn't even tend to occur in the first place, except when the LLM is trained on a bunch of copies of the same text.)
Yea, at the end of the day a big part of this question comes down to whether that copying is fair use and that is an open question with the transformative nature being the primary point in favor of the LLM. But it is copying from some works to another - if it doesn't have some fair use exception it is absolutely violating the licensing of most of the training data. It's a bit different from previous settled case law because it's copying so little from so many billions of different things. I think blocking reproduction is wise by LLM companies for PR purposes but it doesn't guarantee that training is a license exempted activity.