If you're "proofreading" the agents' work in detail, you're doing it wrong. You need to invest that time productively into planning out what the agents are going to do (with AI help, of course) then once the plan has gotten detailed enough you can set the agent to work and treat the result as something to just read through and quickly accept/revise/reject (upon which rejection you go back to an earlier stage of planning and revise that instead). Planning out at the outset keeps you in the driving seat and avoids frustration; the agents are just a multiplier that operates downstream of your design decisions.
Yeah building acceptance criteria first is the way. An LLM is a goal machine. It uses probability over and over to advance towards the goal(s). That’s all it is and wants to do. So giving it well defined and granular goals and guardrails will get the best results.
There is a fine line between "not proofreading" and "not paying attention at all to the output." There are many things that look like they work, but won't pass a sniff test, especially when it comes to security or performance. I witnessed agents create "private" endpoints that had no authentication, but sent user IDs as part of the payload and trusted them.