logoalt Hacker News

adebayojyesterday at 10:00 AM0 repliesview on HN

You are exactly right, it is guiding the model, during training, with concepts and the dictionary. This is important because dictionary learning for interpretability (post hoc) is not currently reliable: https://www.arxiv.org/abs/2602.14111