Your casual understanding is imprecise.
At all times the LLM is, indeed, predicting the next token. Anything it does emerges from that.
It did not "figure anything out". It predicted that text describing the use of a radial gradient was likely to follow text describing your problem.
Lol, the bird did not 'fly' - it just flapped its wings and generated lift!
>At all times the LLM is, indeed, predicting the next token
The point is that saying they're just "predicting the next token" is not at all explanatory nor providing insight. Saying the brain is just firing action potentials gives you no understanding about how the brain does what it does or what the space of its capabilities are. Similarly, predicting the next token tells you nothing about the capabilities of LLMs.