logoalt Hacker News

Ajedi32yesterday at 3:16 PM3 repliesview on HN

No, public data is not generally written by "experts expecting compensation".

By the way, I don't expect you to pay me for this comment. You can just read it for free. You're welcome.


Replies

jaenyesterday at 4:25 PM

Ugh, please don't read strawmen into other's arguments and try to follow the HN guidelines.

Also, how about making proper arguments yourself? The vast majority of the training data isn't generated by company-paid AI experts either.

Notably, books, even though they don't form a large part of the training data, significantly improve performance on some tasks (same way as expert-generated data).

Why do you think the AI labs are so eager about scanning (and then destroying) every book on the planet?

If you removed all copyrighted works from the training corpus, the model would be notably weaker.

calgooyesterday at 3:24 PM

No, but people do upload data with an expectation that the data not being used without their permission (unless they do a BSD/MIT/Public domain like license). Otherwise, the platform AND/OR the user do expect the data NOT to be used for purposes other then what it was intended for. Your comment is still your comment, and the hacker news platform also has a say in this. If there had been an opt-in, then fine no problem, but there was none, they just trained on everything available, including downloading pirated books from the internet.

show 2 replies
pastel8739yesterday at 3:21 PM

Books?

show 1 reply