I never trust the opinion of a single LLM model anymore - especially for more complex projects. I have seen Claude guarantee something is correct and then immediately apologize when I feed a critical review by Codex or Gemini. And, many times, the issues are not minor but are significant critical oversights by Claude.
My habit now: always get a 2nd or 3rd opinion before assuming one LLM is correct.
It doesn’t have to be different foundation models. As long as the temperature is up, as the same model 100 times.
Happy to see someone else doing this.
All code written by an LLM is reviewed by an additional LLM. Then I verify that review and get one of the agents to iterate on everything.