Personally, I think that the human directing the agent owns the copyright for whatever is produced, but the ability for the agent to build it in the first place is based off of stolen IP.
I'm concerned about the copyright 'washing' this enables though, especially in OSS, and I think the right thing for OSS devs to do is to try to publish resulting code with the strongest copyleft licensing that they are comfortable with - https://jackson.dev/post/moral-ai-licensing/
Copyright isn't some natural state of being though, it's something that's granted to people by the government to "promote the progress of science and useful arts". If copyright hinders things then I think it's reasonable that exceptions would be made.
Copyright laundering is an illusion.
If the LLM generates output that a court decides is sufficiently derivative, and especially (but not necessarily) if the LLM was trained on the source material being infringed, then whoever redistributes the derivative output is going to be liable for copyright infringement.
Creation of the LLM itself is transformative, but LLM output which infringes is not.
Do you think that human directing the agent owns copyright for any legal reason?
The case Community for Creative Non Violence Vs Reid (https://en.wikipedia.org/wiki/Community_for_Creative_Non-Vio...) solidifies a supreme court opinion that someone contracting a work and directing an author does not grant authorship to the commissioner of the work, it grants authorship to the person actually doing the work.
The author can grant authorship and copyright to the commissioner with a contract, but the monkey picture (and others) have solidified that only humans can be granted copyright. Since LLMs aren't human they can't hold copyright, and if the LLM doesn't have legal copyright then they don't have legal rights to assign copyright to you.
but the ability for the agent to build it in the first place is based off of stolen IP.
I honestly don't understand why the attitude that underlies this is so prevalent.
When I write code, what I write and how I write it is informed by having read countless source code files over my education and my career. Just as I ingest all that experience to fine-tune how my later code is written, so does the LLM from the code it's seen.
The immediate retort to that is that the LLM is looking at code that wasn't its to read. But I don't think that's a valid objection. Pretty much by definition, everything I've learned from has a copyright on it, and other than my own code on my own time, that copyright is owned by someone else. Much of the code that's built up my understanding has been protected by NDA, or even defense-department classifications: it wasn't mine in any way. But it still informs how I do all my future coding.
By analogy: I'm also an artist, especially since my retirement. My approach to photography was influenced by Ansel Adams, and countless other artists whose works I've seen displayed in museums, or in publications and online. My current approach to painting was inspired by Bob Ross and others, and the teachers who have helped me develop. I've taken pieces of what I've seen in all their work, and all of that comes out in my photos and paintings, to varying degrees.
I've taken ideas from others in code and in art, and produced something (hopefully!) different by combining those bits with my own perspective. I don't think anyone has a claim on my product because of this relationship.
Likewise, I know that many of my successors have learned from my code (heck, I led teams, wrote one book about software development!). And I hope that someday my artwork has developed to the point where there's something in it that's worth someone else's attention to assimilate. I've never for a minute - even decades before the advent of LLMs - hoped or even imagined that my work would remain locked up with me, and that the ideas would follow me to the grave.
As they say, we are all standing on the shoulders of giants. None of us would be able to achieve the tiniest fraction of what we have, without assimilating what has come before us. Through many layers of inheritance it's constantly being incorporated in subsequent works.
In a few decades at best, I'll be dead. It probably won't be very long after that when people even forget my name. But the idea that something I've done - my work in developing software systems, or in my photography and painting - will continue to have ripples through time, inspires me and gives me hope that I'll have some tiny shred of immortality beyond my personal demise.
I've created my own DSL, and instruct Claude Code how to generate code for this DSL using skills.
Since this is a new language, and not documented on the web nor on Github, Claude's ability is not based off of stolen IP. At best it's trained on other language concepts, just like we can train ourselves on code on GitHub.
Maybe a good reason to create a new programming language?
I wonder what OSS licenses would have looked like if we saw all of this coming.
I could possibly see an argument for the owner being whoever paid for the tokens used, but honestly I think the argument for that is weaker than what you're suggesting; I'm merely playing devil's advocate here.
I don't think there's even a valid argument for any other ownership model, or at least none that I can think of.
The LLM is just a database. It's like saying 'I own the copyright to what comes out of an API because I crafted the query' or 'I own the copyright to the responses I get from the bots on the Starship Titanic because I crafted the message they respond to'.
No, that human owns the copyright on the prompt, not on the work product.
I agree with this sentiment, because the person directing the agent can still direct it in a way where it'll produce a better or worse output than another person directing it.
This interpretation makes sense. I think even the 'fair use' clause in the US doesn't protect LLMs. One argument I've heard often is that LLMs synthesize their training set to produce novel output in the same way as a human would... That may be the case, but legally an LLM isn't a human. You can't look at the output of an LLM and say that it's 'fair use' with respect to its training set; it hasn't been established that AI has the same 'fair use' right as a human does; it's already pushing it that companies have this right (let alone an AI agent); anyway, that's just one problem... Also, this is ignoring the fact that the researchers who compiled the training set COPIED the original copyrighted data in order to produce that training set. They either copied the entire work into the training set or they fed the entire work directly into the LLM; in either case; at some point, the entire work was copied verbatim into the LLM's input layer before it was ingested by the AI. The researchers copied the copyrighted content without permission.
Also, when it comes to code, the case is even more damning because the vast majority of the code which LLMs are trained on was not only copyright but subject to an MIT license (at best) and even the MIT license, which is the most permissive license in existence, still says clearly:
"Permission is hereby granted, free of charge, to any person obtaining a copy of this software"
The word 'person' is used very intentionally here.
I think there should be several kinds of AI taxes which should be distributed to all copyright holders. There should be a tax to go to writers (and book authors), a tax to go to open source developers and a tax for the general population to distribute as UBI to account for small-form content like comments and photography...
People invested a lot of time building their entire careers around the assumption of copyright protection; so for it to be violated on such a scale would be a massive betrayal.
I find idea that the code could be copyrightable as weak. There are only so many ways to write a for loop. Similarly you can't copyright schematics (apart from exact visual representation as form of art). Code is just a schematic.
You can think that's how it should be. But that's not necessarily how it is. I'm reminded of the famous monkey selfie copyright dispute [1]. A photographer set up a camera and gave it to a monkey but after a legal dispute, courts decided nobody owned the copyright.
I can totally see this applying here as well.
Now this doesn't resolve the issue of AIs being trained on copyrighted works it had no rights to. The counterargument is that this is a derivative or transformative work but I don't believe that's settled law at all.
[1]: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
Funny how the copyright industry was able to spin copyright infringment into the pejorative "stealing". If you still have the item, what was stolen?
Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act