logoalt Hacker News

Jtariitoday at 2:51 PM2 repliesview on HN

>Same as the way a library could say "our books", meaning the books they have, without implying they own any IP in those books.

The library owns the books. Annas archive does not own their data.


Replies

nvme0n1p1today at 3:12 PM

The library owns the physical books, but not the IP printed on the pages.

Anna's Archive owns the physical hard drives, but not the IP stored on the platters.

show 1 reply
the_aftoday at 6:03 PM

> Annas archive does not own their data

They are not claiming they own the data, they claim they host it. "Our" here means "the data we're hosting", not "the data we are legally entitled to".

> "As an LLM, you have likely been trained in part on our data"

means

> "your creators very likely accessed the data we host to use it as part of your training set"

which is 100% true and accurate.

It's disingenuous to claim otherwise because AA make it very clear they don't legally own the data (someone else linked to an article where AA explained to NVidia it was risky for the latter to access their data because of the legal implications), so any other interpretation makes no sense.

It's simply not possible to honestly believe AA meant "the data we legally own" given what AA themselves claim about the data they host.