> In traditional software law, a “clean room” rewrite requires two teams
So, I dislike AI and wish it would disappear, BUT!
The argument is strange here, because ... how can a2mark ensure that AI did NOT do a clean-room conforming rewrite? Because I think in theory AI can do precisely this; you just need to make sure that the model used does that too. And this can be verified, in theory. So I don't fully understand a2mark here. Yes, AI may make use of the original source code, but it could "implement" things on its own. Ultimately this is finite complexity, not infinite complexity. I think a2mark's argument is in theory weak here. And I say this as someone who dislikes AI. The main question is: can computers do a clean rewrite, in principle? And I think the answer is yes. That is not saying that claude did this here, mind you; I really don't know the particulars. But the underlying principle? I don't see why AI could not do this. a2mark may need to reconsider the statement here.
Turns out there’s no need to speculate. Someone pointed out on GH [0] that the AI was literally prompted to copy the existing code:
> *Context:* The registry maps every supported encoding to its metadata. Era assignments MUST match chardet 6.0.0's `chardet/metadata/charsets.py` at https://raw.githubusercontent.com/chardet/chardet/f0676c0d6a...
> Fetch that file and use it as the authoritative reference for which encodings belong to which era. Do not invent era assignments.
[0] https://github.com/chardet/chardet/issues/327#issuecomment-4...
The foundation model probably includes the original project in its training set, which might be enough for a court to consider it “contaminated”. Training a new foundation model without it is technically possible, but would take months and cost millions of dollars.
Clean room is sufficient, but not necessary to avoid the accusations of license violation.
a2mark has to demonstrate that v7 is "a work containing the v6 or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language", which is different from demanding a clean-room reimplementation.
Theoretically, the existence of a publicly available commit that is half v6 code and half v7 can be used to show that this part of v7 code has been infected by LGPL and must thus infect the rest of v7, but that's IMO going against the spirit of the [L]GPL.
I think the problem here is that an AI is not a legal entity. It doesn't matter if you as individual run an AI that takes the source, dumps out a spec that you then feed into another AI. The legal liability lies with the operator of the AI, the original copyleft license was granted to a person, not to a robot.
Now if you had 2 entirely distinct humans involved in the process that might work though.
> how can a2mark ensure that AI did NOT do a clean-room conforming rewrite?
In cases like this it is usually incumbent on the entity claiming the clean-room situation was pure to show their working. For instance how Compaq clean-room cloned the IBM BIOS chip¹ was well documented (the procedures used, records of comms by the teams involved) where some other manufacturers did face costly legal troubles from IBM.
So the question is “is the clean-room claim sufficiently backed up to stand legal tests?” [and moral tests, though the AI world generally doesn't care about failing those]
--------
[1] the one part of their PCs that was not essentially off-the-shelf, so once it could be reliably legally mimicked this created an open IBM PC clone market