Spawning separate agents to review the original agent's implementation results in a very noticeable increase in code quality and decrease in bugs. This is why I encode two or three rounds of sub-agent review during the planning process, where I tell the agent authoring the plan to include those review rounds at the end. If the code is particularly load-bearing, I then ask a fourth agent, usually from the other frontier lab.
All of this burns more tokens of course, but probably way less than coming back to the code later to fix bugs. It is also slower, but in the long run saves time.
Have you found integrating outputs from different frontier labs consistently improves final results, or is it just kind of voodoo?