logoalt Hacker News

cogman10yesterday at 2:57 PM0 repliesview on HN

> One could carefully calculate exactly how much a given document in the training set has influenced the LLM's weights involved in a particular response.

Not really.

Think of, for example, a movie like "who framed roger rabbit". It had intellectual property from all over. Had the studios not gotten the rights from each or any of those properties, they could have been sued for copyright infringement. It's not really a question of influence.

So yeah, while the LLM might have been trained on the kernel, it was also likely trained on code with commercial licenses. Conversely, because was trained on code with GPL licenses, that might mean commercial software with LLM contributions need to inherit the GPL to be legal (and a bunch of other licenses).

It's a big old quagmire and I think lawyers haven't caught up enough with how LLMs work to realize this.