LLMs are modelled to predict the next token, and are indeed trained to do so on enormous bodies of text. But to be really good at predicting the next token (word) at the end of a long string of text, you must understand what the text means. If I give you the entire text of a long novel and at the end ask you a single "yes/ no" question about the plot, you only need to emit a single token, but emitting the correct one implies having understood the plot of the novel. This is what LLMs do. They're generating meaningful, coherent text, which implies understanding and cognition at a level that is much deeper than that of the single token they generate at each forward pass. Internally, the LLM has learned to represent the meaning of the entire prompt text, the concepts it implies and its possible continuations far beyond the horizon of simply outputting the next token.
> This is what LLMs do. They're generating meaningful, coherent text
No, they generate grammatically coherent text. That is because human language grammars are fundamentally mathematical structures that can be approximated with matrix operations.
They don't generate meaningful text because they have no inherent knowledge of the world.
If you've used LLMs for any amount of time you've already noticed how often they get confused about numeric quantities - like confusing notions of "bigger than" and "less than" or being unable to count letters in words.
This is because any meaning in their output is only accidental.