>You are never going to get the exact book word-for-word using LLM.
This is pretty much the exact claim of a NYT lawsuit against OpenAI.
"One example: Bing Chat copied all but two of the first 396 words of its 2023 article “The Secrets Hamas knew about Israel’s Military.” An exhibit showed 100 other situations in which OpenAI’s GPT was trained on and memorized articles from The Times, with word-for-word copying in red and differences in black."
https://www.hollywoodreporter.com/business/business-news/cou...
Yes, LLMs fundamentally operate as a lossy compression scheme for their training data. There's been countless examples of them reproducing their training data with very high accuracy
People claim that the data isn't stored, but clearly a representation of it is encoded and reproducible. I saw chatgpt word for word plagiarise a stack overflow comment just two days ago