For the uninitiated: Interestingly, it is not advisable to take this to the extreme and set temperature to 0.
That would seem logical, as the results are then completely deterministic, but it turns out that a suboptimal token may result in a better answer in the long run. Also, allowing for a little bit of noise gives the model room to talk itself out of a suboptimal path.
For the uninitiated: Interestingly, it is not advisable to take this to the extreme and set temperature to 0.
That would seem logical, as the results are then completely deterministic, but it turns out that a suboptimal token may result in a better answer in the long run. Also, allowing for a little bit of noise gives the model room to talk itself out of a suboptimal path.