logoalt Hacker News

raddantoday at 2:20 AM1 replyview on HN

I’m not clear what you mean by “know.” If you mean “the information is in the model” then I mostly agree, distributional information is represented somewhere. But if you mean that a model can actually access this information in a meaningful and accurate way—say, to state its confidence level—I don’t think that’s true. There is a stochastic process sampling from those distributions, but can the process introspect? That would be a very surprising capability.


Replies

kneyedtoday at 2:30 AM

yes:

> In this experiment, however, the model recognizes the injection before even mentioning the concept, indicating that its recognition took place internally.

https://www.anthropic.com/research/introspection