So we've basically taken the concept of branch prediction from CPUs and applied it to LLMs?
The concept of predicting future elements in a series is not specific to CS. It's older than computers.
Well, the TPUs they're running on don't have branch prediction, so that had to end up somewhere in the stack.
Maybe at very high level of abstraction, but there's no branching involved.