> Namely setting temperature to zero, and turning off all history
That's not nearly enough, though. The multi-node/GPU inference and specifically batching (and ordering in batching) have non-deterministic consequences for the current LLM services.
True but for small models it's pretty close. See my comment below about other cases leading to nondeterminism.
True but for small models it's pretty close. See my comment below about other cases leading to nondeterminism.