This is fantastic to read. LLMs feel like black boxes and for the large ones especially I have a sen...

vintagedave • yesterday at 9:12 AM • 1 reply • view on HN

This is fantastic to read. LLMs feel like black boxes and for the large ones especially I have a sense they genuinely form concepts. Yet the internals were opaque. I remember reading how LLMs cannot explain their own behaviour when asked.

I feel this would give insight into all that including the degree of true conceptualisation. I’m curious if this can demonstrate what else the model is aware of when answering, too.

Replies

adebayoj • yesterday at 9:31 AM

Our decomposition allows us to answer question like: for 84 percent of the model's representation, we know it is relying on this concept to give an answer.

We can also trace its behavior to the training data that led to it, so that can show us where some of these concepts are formed from.

alt Hacker News

Replies