> They find the probability of every word that could come next.
If we're being pedantic, they find a* probability for every token (which are sometimes words) that could come next.
What actually ends up being chosen depends on what the rest of the system does, but generally it will just choose the most probable token before continuing.
* Saying the probability would be giving a bit too much credit. And really calling it a probability at all when most systems would be choosing the same word every time is a bit of a misnomer as well. During inference the number generally is priority, not probability.
I was using the term word to be consistent with the previous comment. It need not be a word, or even text at all.
Most systems choosing the high probability thing is what probability is.
They're just relative scores. If you assume they add to one and select one based on that it's a probability.