My biggest problem with LLM's at this point is that they produce different and inconsistent results or behave differently, given the same prompt. The better grounding would be amazing at this point. I want to give an LLM the same prompt on different days and I want to be able to trust that it will do the same thing as yesterday. Currently they misbehave multiple times a week and I have to manually steer it a bit which destroys certain automated workflows completely.
> I want to give an LLM the same prompt on different days and I want to be able to trust that it will do the same thing as yesterday
Bad news, it's winter now in the Northern hemisphere, so expect all of our AIs to get slightly less performant as they emulate humans under-performing until Spring.
You need to change the temperature to 0 and tune your prompts for automated workflows.
It sounds like you have dug into this problem with some depth so I would love to hear more. When you've tried to automate things, I'm guessing you've got a template and then some data and then the same or similar input gives totally different results? What details about how different the results are can you share? Are you asking for eg JSON output and it totally isn't, or is it a more subtle difference perhaps?