Matrix multiplication on GPUs is non-deterministic. As are things like cumsum()

jmalicki • yesterday at 9:10 AM • 2 replies • view on HN

https://docs.pytorch.org/docs/2.11/generated/torch.use_deter...

This comes down to map reduce and floating point's lack of associativity. You see the same thing with OpenMP on CPUs.

People are constantly claiming determinism in LLMs that is just not there.

Replies

Even if it were reproducible, realistically most people are using some service like Claude that makes no guarantee that the model or hardware didn't change. Which is fine, it doesn't need reproducibility.

This is interesting though, I didn't know PyTorch had a debug mode for reproducibility.

➕ show 1 reply

vrighter • yesterday at 10:57 AM

well just run all inference on the cpu, single threaded /s

alt Hacker News

Replies