I’m not clear what you mean by “know.” If you mean “the information is in the model” then I mostly a...

raddan • today at 2:20 AM • 1 reply • view on HN

I’m not clear what you mean by “know.” If you mean “the information is in the model” then I mostly agree, distributional information is represented somewhere. But if you mean that a model can actually access this information in a meaningful and accurate way—say, to state its confidence level—I don’t think that’s true. There is a stochastic process sampling from those distributions, but can the process introspect? That would be a very surprising capability.

Replies

kneyed • today at 2:30 AM

yes:

> In this experiment, however, the model recognizes the injection before even mentioning the concept, indicating that its recognition took place internally.

https://www.anthropic.com/research/introspection

alt Hacker News

Replies