Looks neat and original, congrats!
I don't quite grasp how to interpret the training data attribution process. For example, it seems to say that for a given sentence like "They argued that humans tend to weigh losses more heavily than gains, leading to risk aversion", 24% is attributed to Wikipedia and 23% to Arxiv.
Does that mean that the concepts used in this sentence are also found in those datasets, and that's what's getting compared here? Or does it mean that you can track down which parts of the training data were interpolated to create that sentence?
Great questions. We weren't quite explicit about the training data attribution process. We'll discuss this in more detail in future work. We can track down which parts of the training data were interpolated to create that sentence. For those training data sentences, we then compare the concepts between generated and training.
We can attribute to exact sentences and chunks in the training data. For the first release, we are sharing only concept similarities. Over the coming weeks, we'll share and discuss how you can actually map to the exact training sentence and chunk with the model.
For a technical overview of how some of these models work, check this link out: https://www.guidelabs.ai/post/prism/