logoalt Hacker News

emsignyesterday at 9:18 AM6 repliesview on HN

By design you can't know if the LLM doing the rewrite was exposed to the original code base. Unless the AI company is disclosing their training material, which they won't because they don't want to admit breaking the law.


Replies

shevy-javayesterday at 10:21 AM

> By design you can't know if the LLM doing the rewrite was exposed to the original code base.

I agree, in theory. In practice courts will request that the decision-making process will be made public. The "we don't know" excuse won't hold; real people also need to tell the truth in court. LLMs may not lie to the court or use the chewbacca defence.

Also, I am pretty certain you CAN have AI models that explain how they originated to the decision-making process. And they can generate valid code too, so anything can be autogenerated here - in theory.

airforce1yesterday at 4:53 PM

I don't see how this is different from current human poaching practices. i.e. It appears to be currently legal to hire an employee from company A who has been "tainted" by company A's [proprietary AI secrets/proprietary CPU architecture secrets/etc] in order to develop a competing offering for company B. i.e. It's not illegal for a human who worked at Intel for 20 years to go work for AMD even though they are certainly "tainted" with all sorts of copyrighted/proprietary knowledge that will surely leak through at AMD. Maybe patents are a first line of defense for company A, but that can't prevent adjacent solutions that aren't outright duplications and circumvent the patent.

soulofmischiefyesterday at 10:07 AM

Seeing the source for a project doesn't prevent me from ever creating a similar project, just because I've seen the code. The devil is in the details.

show 1 reply
gostsamoyesterday at 9:47 AM

it was exposed when it was shown the thing to rewrite.

show 1 reply
skeledrewyesterday at 9:53 AM

It doesn't even matter if the LLM was exposed during training. A clean-room rewrite can be done by having one LLM create a highly detailed analysis of the target (reverse engineering if it's in binary form), and providing that analysis to another LLM to base an implementation.

show 2 replies
d1sxeyesyesterday at 9:41 AM

Is it against the law for an LLM to read LGPL-licensed code?

That’s a complex question that isn’t solved yet. Clearly, regurgitating verbatim LGPL code in large chunks would be unlawful. What’s much less clear is a) how large do those chunks need to be to trigger LGPL violations? A single line? Two? A function? What if it’s trivial? And b) are all outputs of a system which has received LGPL code as an input necessarily derivative?

If I learn how to code in Python exclusively from reading LGPL code, and then go away and write something new, it’s clear that I haven’t committed any violation of copyright under existing law, even if all I’m doing as a human is rearranging tokens I understand from reading LGPL code semantically to achieve new result.

It’s a trying time for software and the legal system. I don’t have the answers, but whether you like them or not, these systems are here to stay, and we need to learn how to live with them.