I've found this orchestrator+reviewer+judge setup to yield much better results than anything else I've tried. And it's such a simple setup - a few markdown files.
I'm also creating one that is similar, but purpose-built for making the plans that this setup can orchestrate. It still needs some tweaking to get agents to follow it better - it still takes additional prompting to nudge it down the proper path. But I've had similar benefits - sending plans through this adversarial review loop has yielded significant improvements in final output.
https://github.com/Vibecodelicious/llm-conductor/blob/main/p...
Unrelated but this just happened and I thought of you ;-)
I don't know what's wrong with your Codex, but mine can't bring itself to break the rules.