Why not allow the user to provide the seed used for the generation. That way at least we can detect ...

hleszek • yesterday at 9:22 PM • 2 replies • view on HN

Why not allow the user to provide the seed used for the generation. That way at least we can detect if the model has changed if the same prompt with the same seed suddenly gives a new answer (assuming they don't cache answers), you could compare different providers which supposedly use the same model, and if the model is open-weight you could even compare yourself on your own hardware or on rented gpus.

Replies

bthornbury • yesterday at 9:27 PM

AFAIK seed determinism can't really be relied upon between two machines, maybe not even between two different gpus.

➕ show 2 replies

bthornbury • yesterday at 9:30 PM

Something like a perplexity/log-likelihood measurement across a large enough number of prompts/tokens might get you the same in a statistical sense though. I expect those comparison percentages at the top are something like that.

alt Hacker News

Replies