>You are never going to get the exact book word-for-word using LLM. This is pretty much the exa...

lbrito • today at 3:06 PM • 1 reply • view on HN

>You are never going to get the exact book word-for-word using LLM.

This is pretty much the exact claim of a NYT lawsuit against OpenAI.

"One example: Bing Chat copied all but two of the first 396 words of its 2023 article “The Secrets Hamas knew about Israel’s Military.” An exhibit showed 100 other situations in which OpenAI’s GPT was trained on and memorized articles from The Times, with word-for-word copying in red and differences in black."

https://www.hollywoodreporter.com/business/business-news/cou...

Replies

20k • today at 5:58 PM

Yes, LLMs fundamentally operate as a lossy compression scheme for their training data. There's been countless examples of them reproducing their training data with very high accuracy

People claim that the data isn't stored, but clearly a representation of it is encoded and reproducible. I saw chatgpt word for word plagiarise a stack overflow comment just two days ago

➕ show 1 reply

alt Hacker News

Replies