Unless the "AI" content output is fundamentally unable to prevent piracy of other peoples content (it demonstrably can't even on a CEO live stream.) Most models will happily spew any statistically salient trademark, copyrighted and or patented code/music/images/video. Note too, GPL/LGPL is a contaminating license, so legal submarines will surface sooner or later if injected into closed-source projects.
The "how" it happens part is just legally irrelevant "[piracy] with extra steps", but if you are interested in details see below. =3
https://www.youtube.com/watch?v=YhgYMH6n004
https://www.cbsnews.com/news/taylor-swift-ai-voice-likeness-...
Here is a simplified explanation of how vector search is done in many models:
https://www.youtube.com/watch?v=YDdKiQNw80c
And a more detailed toy implementation to learn how to build your own:
> Unless the "AI" content output is fundamentally unable to prevent piracy of other peoples content (...)
Your comment makes no sense. The whole concept of "piracy" is meaningless when applied to LLMs, unless you go way out of your way to prompt models to output specific works verbatim.
Also, you do not "pirate" Harry Potter if you prompt a model to generate a story that directly or indirectly involves Harry Potter in any way. Like always. You can argue trademark violations or copyright violations if someone tries to use said work for commercial purposes, but LLMs are orthogonal concepts.
Just because Photoshop allows you to hack together variants of the coca-cola logo that does not mean Adobe is liable for trademarks or copyright violations.