IME it seems that output quality is directly proportional to the amount of engineering effort you put in. If a bug happens and you just tell the model to fix it over and over with no critical thinking, you end up with an 800 line shell script meant to change the IP address on an interface (real example). If you stop and engage your brain to reason about bugs and explain the problem, the model can fix it in an acceptable manner.
If you want to get good results, you still have to be an engineer about it. The model multiplies the effort you put in. If your effort and input is near zero, you get near zero quality out. If you do the real work and relegate the model to coloring inside the lines, you get excellent results.
Even my guardrails can’t replace experience. You have to pay attention. This is exactly how some devs land in whack-a-mole loops.