You are exactly right, it is guiding the model, during training, with concepts and the dictionary. This is important because dictionary learning for interpretability (post hoc) is not currently reliable: https://www.arxiv.org/abs/2602.14111