logoalt Hacker News

adebayojyesterday at 9:14 AM0 repliesview on HN

Good point. Historically, people have thought that there is a interpretability vs quality/performance tax. This is not true; at least not in this case.

Here are a bunch of questions you can answer without any quality degradation with interpretable models: 1) what part of the input context led to the output chunk that the model generated? 2) what part of the training data led to the output chunk?

In this case, we go more invasive, and actually constrain the model to also use human understandable concepts in its representations. You might think this leads to quality trade-offs. However, if you allow for the model to discover its own concepts as well (as long as they are not duplicates of the concepts you provided it), you don't see huge degradation.

I agree with the other commenters that this now gives us a huge boost in debugging the model.