> Namely setting temperature to zero, and turning off all history That's not nearly enough...

nnevatie • today at 4:55 AM • 1 reply • view on HN

> Namely setting temperature to zero, and turning off all history

That's not nearly enough, though. The multi-node/GPU inference and specifically batching (and ordering in batching) have non-deterministic consequences for the current LLM services.

Replies

2ndorderthought • today at 11:59 AM

True but for small models it's pretty close. See my comment below about other cases leading to nondeterminism.

alt Hacker News

Replies