This is what I do in llm-consortium. An arbiter evaluates the response(s) and decides if more iterations are needed. You can also loop until a minimum confidence threshold, but self-reported confidence isn't a great metric.