There's some statistical nuance here. LLMs output predicted probabilities of the next token, but no modern LLM predicts the next token by taking the highest probability (temperature = 0.0), but instead uses it as a sampling distribution (temperature = 1.0). Therefore, output will never be truly deterministic unless it somehow always predicts 1.0 for a given token in a sequence.
With the advancements in LLM posttraining, they have gotten better at assigning higher probabilities to a specific token which will make it less random, but it's still random.
There's some statistical nuance here. LLMs output predicted probabilities of the next token, but no modern LLM predicts the next token by taking the highest probability (temperature = 0.0), but instead uses it as a sampling distribution (temperature = 1.0). Therefore, output will never be truly deterministic unless it somehow always predicts 1.0 for a given token in a sequence.
With the advancements in LLM posttraining, they have gotten better at assigning higher probabilities to a specific token which will make it less random, but it's still random.