Can you point to some examples? Also I wonder if this needs to be very model/harness specific? Like even model version, subversion.
And probably that should be run in different harness or with custom system prompt? Since they introduce quirks and glitches as well.
(somehow this motivated me to resurrect HN account)
I think you’re probably overthinking it. The same model will in my experience find errors in the same thing it just generated. I think reviewing is a totally different place in latent space, than implementing. Anyhow, it works well for me.
I often tell codex to launch a subagent without prior context to „remove BS phrases and make the prose sound more natural and higher readability“. That‘s usually enough to get better results.