You need to change the temperature to 0 and tune your prompts for automated workflows.
have you tried this? this doesnt work because the way inference runs at big companies. its not just running your query in isolation.
maybe it can work if you are running your own inference.
It doesn’t really solve it as a slight shift in the prompt can have totally unpredictable results anyway. And if your prompt is always exactly the same, you’d just cache it and bypass the LLM anyway.
What would really be useful is a very similar prompt should always give a very very similar result.