logoalt Hacker News

nnevatietoday at 4:55 AM1 replyview on HN

> Namely setting temperature to zero, and turning off all history

That's not nearly enough, though. The multi-node/GPU inference and specifically batching (and ordering in batching) have non-deterministic consequences for the current LLM services.


Replies

2ndorderthoughttoday at 11:59 AM

True but for small models it's pretty close. See my comment below about other cases leading to nondeterminism.