Those are all just optimizations.
We still don’t really know why they work, we just know how to build them.
Hm, I wonder if it's more that we're shocked such a simple thing (relatively speaking) can work so well.
We do know how they work. They predict the next statistically most likely token.
The "bitter lesson" is that fake-it-till-you-make-it is a valid way of doing knowledge work.
(Or not make it, then people will just claim you're holding the LLM wrong and it's not the AI's fault.)
We don't really know why language works with humans, either. If you raise a baby from birth, you kind of observe how it is learning language, but the process is also rather mysterious. My eldest son's first word was to actually imitate a cow mooing, and then after that to imitate a motor noise of a tractor or truck. And then after that a meow. (His first complete sentence was "King Graham fell"...)
My next child took a completely different path to language, including skipping all the non-verbal imitations.
And then at some point, you just suddenly can two-way communicate with them when you couldn't before, and then after that, they can engage in reasoning.