> Uh, that is exactly what a derivative work is.
No, it isn't. A derivative work isn't something based on extracting underlying ideas or patterns from another work, it's something that includes copyrighted portions of the other work.
An annotated edition of Hamlet is a derivative work. A Cliff's Notes summary of Hamlet is a derivative work.
Strange Brew and The Lion King are not derivative works of Hamlet simply because they include literary themes and plot points that originated in Hamlet. A list of word counts of popular works of literature that includes an entry for Hamlet is also not a derivative work. The Markov chain described above is not a derivative work.
> The obvious follow up here is whether an LLM is creating transformative derivations or not. A lot of folks argue that yes, an LLM spitting out statistically sampled code that matches existing code is not transformative and is (or might be) infringing the terms of the license it was released under.
And I would agree with them. An LLM that actually is outputting non-trivial code that matches a public project's code verbatim is engaging in copying, and not stochastic inference.
> I think it's a pretty obvious "somewhere in the middle" that is gonna make a bunch of lawyers a whole lot of money.
It's a shame that the same fundamental questions have to be relitigated over and over again just because the contextual formalities and modes of expression have changed. I wonder how many of the legal cases are going to be copies or derivative works of previous ones.