logoalt Hacker News

63stackyesterday at 9:47 AM1 replyview on HN

Github has posted that they will now train on everyone's data (even private) unless you opt out (until they change their mind on that). Anthropic has been training on your data on certain tiers already. Meta bittorrented books to train their models.

Surely if your license says "LLM output trained on this code is legally tainted", it is going to dissuade them.


Replies

archagonyesterday at 6:42 PM

No, it won’t dissuade them. But when we finally get the chance to legally beat the shit out of these companies, I want to reserve my place in line.

Alternatively, they can learn to trust me on this and simply exclude/evict my code from the training corpus.