logoalt Hacker News

vintagedaveyesterday at 9:12 AM1 replyview on HN

This is fantastic to read. LLMs feel like black boxes and for the large ones especially I have a sense they genuinely form concepts. Yet the internals were opaque. I remember reading how LLMs cannot explain their own behaviour when asked.

I feel this would give insight into all that including the degree of true conceptualisation. I’m curious if this can demonstrate what else the model is aware of when answering, too.


Replies

adebayojyesterday at 9:31 AM

Our decomposition allows us to answer question like: for 84 percent of the model's representation, we know it is relying on this concept to give an answer.

We can also trace its behavior to the training data that led to it, so that can show us where some of these concepts are formed from.