No, it can’t recreate a book. Well, maybe it could get most of the way for the Bible. That is an exceptional case because its adherents are constantly quoting verses religiously. I expect it’s the most reproduced, quoted, and translated book in history by a very significant margin. It’s also not copyrighted.
Can you do this for the general case? No, not even for extremely popular books. People might quote Harry Potter a lot, but they don’t quote the entire thing over and over, chapter and verse, on hundreds of thousands of different websites. The number of times Bible verses appear in the training data is going to absolutely dwarf the number of times Harry Potter quotes appear, and people aren’t quoting all parts of Harry Potter, just the interesting parts.
> When i ask chatgpt for a specific page or so from HP I get the impression that the model would be perfectly capable of doing so but is hindred by extra work openAI put in to prevent the answer specifically because of copyright.
They do put extra work in to filter this stuff out, but even if they didn’t the model wouldn’t be able to reproduce entire chapters, let alone entire books.
You can test this for yourself. Remember, this lawsuit isn’t against OpenAI, it’s against Meta. Download Llama and try to get it to reproduce Harry Potter. There won’t be any guardrails imposed on top of the model if you run it locally.
>People might quote Harry Potter a lot, but they don’t quote the entire thing over and over, chapter and verse, on hundreds of thousands of different websites.
I'm fairly certain I could find the entire thing in plain text in multiple places online. A quick google gives the philosophers stone as the second result in pdf format on the internet archive but i'm sure with a bit of looking i'd bump into a lot of plaintext copies.
They might have taken measures to prevent this from being anywhere their training data (i think it would be fairly easy and something they'd likely do) but if they at any point failed for a book or so that they didn't consider wouldn't my original question stand?