logoalt Hacker News

amenhotepyesterday at 3:52 PM1 replyview on HN

When you buy, or pirate, a book, you didn't enter into a business relationship with the author specifically forbidding you from using the text to train models. When you get tokens from one of these providers, you sort of did.

I think it's a pretty weak distinction and by separating the concerns, having a company that collects a corpus and then "illegally" sells it for training, you can pretty much exactly reproduce the acquire-books-and-train-on-them scenario, but in the simplest case, the EULA does actually make it slightly different.

Like, if a publisher pays an author to write a book, with the contract specifically saying they're not allowed to train on that text, and then they train on it anyway, that's clearly worse than someone just buying a book and training on it, right?


Replies

BeetleByesterday at 4:23 PM

> When you buy, or pirate, a book, you didn't enter into a business relationship with the author specifically forbidding you from using the text to train models.

Nice phrasing, using "pirate".

Violating the TOS of an LLM is the equivalent of pirating a book.