Yeah, an LLM, being a machine obviously shouldn't hold copyright. But that doesn't stop pe...

martin-t • last Saturday at 12:19 PM • 2 replies • view on HN

Yeah, an LLM, being a machine obviously shouldn't hold copyright. But that doesn't stop people claiming that running vast amounts of code through an LLM can strip copyright from it.

Ultimately LLMs (the first L stands for large and for a good reason) are only possible to create by taking unimaginable amounts of work performed by humans who have not consented to their work being used that way, most of whom require at least being credited in derivative works and many of whom have further conditions.

Now, consent in law is a fairly new concept and for now only applied to sexual matters but I think it should apply to every human interaction. Consent can only be established when it's informed and between parties with similar bargaining power (that's one reason relationships with large age gaps are looked down upon) and can be revoked at any time. None of the authors knew this kind of mass scraping and compression would be possible, it makes sense they should reevaluate whether they want their work used that way.

There are 3 levels to this argument:

1) The letter of the law - if you understand how LLMs work, it's hard to see them as anything more than mechanical transformers of existing work so the letter should be sufficient.

2) The intent of the law - it's clear it was meant to protect human authors from exploitation by those who are in positions where they can take existing work and benefit from it without compensating the authors.

3) The ethics and morality of the matter - here it's blatantly obvious that using somebody's work against their wishes and without compensating them is wrong.

In an ideal world, these 3 levels would be identical but they're not. That means we should strive to make laws (in both intent and letter) more fair and just by changing them.

Replies

MarsIronPI • last Saturday at 1:06 PM

If consent to use of your code in AI training can be revoked at any time, that makes training impossible, since if anyone ever withdraws consent, it's not like you can just take out their work from your finished model.

➕ show 1 reply

adrian_b • last Saturday at 3:15 PM

Almost all the productivity enhancement provided by an AI coding assistant is provided by circumventing the copyright laws, with the remaining enhancement being provided by the fact that it automates the search-copy-paste loop that you would do if you had direct access to the programs used during training.

(Much of the apparent gain of the automatic search-copy-paste is wasted by skipping the review phase that would have been done at that time when that were done manually, which must then be done in a slower manner when you must review the harder-to-understand entire program generated by the AI assistant.)

Despite the fact that AI coding assistants are copyright breaking tricks, the fact that this has become somehow allowed is an overall positive development.

The concept of copyright for programs has been completely flawed from its very beginning. The reason is that it is absolutely impossible to write any kind of program that is not a derivative of earlier programs.

Any program is made by combining various standard patterns and program structures. You can construct a derivation sequence between almost any 2 programs, where you decompose the first in some typical blocks, than compose the second program from such blocks, while renaming all identifiers.

It is quite subjective to decide when a derivation sequence becomes complex enough that the second program should not be considered as a derivative of the first from the point of view of copyright.

The only way to avoid the copyright restrictions is to exploit loopholes in the law, e.g. if translating an algorithm to a different programming language does not count as being derivative or when doing other superficial automatic transformations of a source program changes its appearance sufficiently that it is not recognized as derivative, even if it actually is. Or when combining a great number of fragments from different programs is again not recognized as derivative, though it still kind of is.

The only way how it became possible for software companies like Microsoft or Adobe to copyright their s*t is because the software industry based on copyrighted programs has been jumpstarted by a few decades of programming during which programs were not copyrighted, which could then be used as a base by the first copyrighted programs.

So AI coding agents allow you to create programs that you could not have written when respecting the copyright laws. They also may prevent you from proving that a program written by someone else infringes upon the copyright that you claim for a program written with assistance.

I believe that both these developments are likely to have more positive consequences than negative consequences. The methods used first in USA and then also in most other countries (due to blackmailing by USA) for abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades.

The most ridiculous claim about the copyright of programs is that it is somehow beneficial for "creators". Artistic copyrights sometimes are beneficial for creators, but copyrights on non-open-source programs are almost never owned by creators, but by their employers, and even those have only seldom any direct benefit from the copyright, but they use it with the hope that it might prevent competition.

➕ show 1 reply

alt Hacker News

Replies