logoalt Hacker News

xdennisyesterday at 4:37 AM5 repliesview on HN

That's one aspect, which is a bit of a gray zone. But Anthropic trained on pirated books. That is explicitly illegal.


Replies

mcastyesterday at 2:22 PM

That ship has sailed, I would wager all the AI labs are ingesting anything human generated, whether that means Hollywood movies, Taylor Swift’s discography, YouTube videos or private GitHub source repos.

The reward for having a competitive edge is exponentially higher than the risk of a lawsuit. Politicians are still old bureaucrats who don’t understand technology.

herohyesterday at 11:42 AM

so did Meta for Llama.

The entire chat thread and email exchange was exposed in Discovery; apparently Zuck signed off on it. In one of the IM exchanges one of them say ‘everyone is doing it’

https://x.com/jason_kint/status/1879152507865485497?s=20

lambdaoneyesterday at 11:34 AM

As I understand it what was "explicitly illegal" was copying the books, in the sense of mere copying before feeding them to the model, and this is what the Anthropic copyright settlement is about.

Actually processing them through the model, though, was considered transformative and therefore fair use.

freejazzyesterday at 5:52 PM

They didn't train on the books and that court only found that the pirating was illegal anyway.

ares623yesterday at 7:38 AM

I'd love to see an open-source project that's basically a Torrent client for downloading pirated material, but it trains an AI model "in the background" using the downloaded content. That way everyone can claim fair use for possessing copyrighted material, I mean there's precedent right?