logoalt Hacker News

shevy-javayesterday at 10:11 AM5 repliesview on HN

> In traditional software law, a “clean room” rewrite requires two teams

So, I dislike AI and wish it would disappear, BUT!

The argument is strange here, because ... how can a2mark ensure that AI did NOT do a clean-room conforming rewrite? Because I think in theory AI can do precisely this; you just need to make sure that the model used does that too. And this can be verified, in theory. So I don't fully understand a2mark here. Yes, AI may make use of the original source code, but it could "implement" things on its own. Ultimately this is finite complexity, not infinite complexity. I think a2mark's argument is in theory weak here. And I say this as someone who dislikes AI. The main question is: can computers do a clean rewrite, in principle? And I think the answer is yes. That is not saying that claude did this here, mind you; I really don't know the particulars. But the underlying principle? I don't see why AI could not do this. a2mark may need to reconsider the statement here.


Replies

dspillettyesterday at 10:41 AM

> how can a2mark ensure that AI did NOT do a clean-room conforming rewrite?

In cases like this it is usually incumbent on the entity claiming the clean-room situation was pure to show their working. For instance how Compaq clean-room cloned the IBM BIOS chip¹ was well documented (the procedures used, records of comms by the teams involved) where some other manufacturers did face costly legal troubles from IBM.

So the question is “is the clean-room claim sufficiently backed up to stand legal tests?” [and moral tests, though the AI world generally doesn't care about failing those]

--------

[1] the one part of their PCs that was not essentially off-the-shelf, so once it could be reliably legally mimicked this created an open IBM PC clone market

foltikyesterday at 12:53 PM

Turns out there’s no need to speculate. Someone pointed out on GH [0] that the AI was literally prompted to copy the existing code:

> *Context:* The registry maps every supported encoding to its metadata. Era assignments MUST match chardet 6.0.0's `chardet/metadata/charsets.py` at https://raw.githubusercontent.com/chardet/chardet/f0676c0d6a...

> Fetch that file and use it as the authoritative reference for which encodings belong to which era. Do not invent era assignments.

[0] https://github.com/chardet/chardet/issues/327#issuecomment-4...

show 1 reply
titanomachyyesterday at 10:14 AM

The foundation model probably includes the original project in its training set, which might be enough for a court to consider it “contaminated”. Training a new foundation model without it is technically possible, but would take months and cost millions of dollars.

orthoxeroxyesterday at 11:07 AM

Clean room is sufficient, but not necessary to avoid the accusations of license violation.

a2mark has to demonstrate that v7 is "a work containing the v6 or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language", which is different from demanding a clean-room reimplementation.

Theoretically, the existence of a publicly available commit that is half v6 code and half v7 can be used to show that this part of v7 code has been infected by LGPL and must thus infect the rest of v7, but that's IMO going against the spirit of the [L]GPL.

show 1 reply
__alexsyesterday at 10:16 AM

I think the problem here is that an AI is not a legal entity. It doesn't matter if you as individual run an AI that takes the source, dumps out a spec that you then feed into another AI. The legal liability lies with the operator of the AI, the original copyleft license was granted to a person, not to a robot.

Now if you had 2 entirely distinct humans involved in the process that might work though.