Thankfully it's not as bad as that. The 50% that goes red means we re-execute those steps, potentially several times, to see if they succeed, before we even bother manually looking at it. But the overall principle holds: First yoou multiply the cost by re-running, then eventually you either need to kick it up to a more expensive model and/or a human.
But of course this is also only viable for non-latency sensitive work, for starters.