I still find the idea that "learning" from code is "stealing" kind of ridiculous.
Yes I guess there's also no such thing as stealing in torrents since the computer "learns" the data and returns it in a transcoded fashion so it's technically not a reproduction. Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".
The mental calisthenics required to justify this stuff must be exhausting.
I think that it's absurd that we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.
I mean I don't think think I could find a better description for following the derivatives of error in reproducing a set of works as creating a "derivative work".
I find it more ridiculous to equate the act of a human learning with for-profit AI training without recompense to the authors of the training material.
Learning, probably not.
Copy/pasting at scale, yes
If there were the case, then imagine having to give it back!
If you can set a copyright trap and an LLM reproduces it I think it's pretty clear cut that it's more than just "learning".
I have seen LLMs do all sorts of crap which was clearly reproduction of training material.
This is also why people are most impressed with how much better it is at reproducing boilerplate rather than, say, imaginative new ideas.
If I “learned” your essay and handed it in, would you be happy with that?
The "learning" isn't learning really. I mean it might be, but if you define learning to be a human endeavor than AI can't learn.
It's perfectly reasonable to say it's okay for humans to do something but not okay for a computer program to do the same thing. We don't have to equate AI to humans, that's a choice and usually a bad one.