op here, I mostly agree with your comment! However, our model does more than this. For any chunk the model generates, it can answer: which concept, in the model's representations, was responsible for that token(s). In fact, we can answer the question: what training data caused the model to be generated too! We force this to be a constraint as part of the architecture and the loss function for our you train the model. In fact, you can get are the high level reasons for a model's answer on complex problems.
> what training data
The demo just says "Wikipedia" or "ArXiV". That's pretty broad and maybe not that useful. Can it get more specific than that, like the actual pages?
All of the examples on the linked page seem to be "good" outputs. Attribution sounds most useful to me in cases where an LLM produces the typical kind of garbage response: wrong information in the training data, hallucinations, sycophancy, over-eagerly pattern matching to unasked but similar, well-known questions. Can you give an example of a bad output, and show what the attribution tells us?