No. The how is relevant here because it leads to understanding of the resulting behavior.
If you train the LLM on a corpus that shows people saying the sky is red, you get an LLM that is predisposed to say the sky is red. This is true even if it's also trained on all of the science that explains how and why the sky is blue.
If it were to "figure out" or "reason", it would not have such a predisposition to emit "red" after "the sky is" just because that matches the reward during training.
In other words, the token prediction is important because it both explains the successes AND the failures of the LLM. If there were situations in which a bird could fail to fly, then how it tried to fly would also be crucial knowledge.
No. The how is relevant here because it leads to understanding of the resulting behavior.
If you train the LLM on a corpus that shows people saying the sky is red, you get an LLM that is predisposed to say the sky is red. This is true even if it's also trained on all of the science that explains how and why the sky is blue.
If it were to "figure out" or "reason", it would not have such a predisposition to emit "red" after "the sky is" just because that matches the reward during training.
In other words, the token prediction is important because it both explains the successes AND the failures of the LLM. If there were situations in which a bird could fail to fly, then how it tried to fly would also be crucial knowledge.