If you are really good and fast validating/fixing code output or you are actually not validating it more than just making sure it runs (no judging), I can see it paying out 95% of the time.
But for what I've seen both validating my and others coding agents outputs I'd estimate a much lower percentage (Data Engineering/Science work). And, oh boy, some colleages are hooked to generating no matter the quality. Workslop is a very real phenomenon.
This matches my experience using LLMs for science. Out of curiosity, I downloaded a randomized study and the CONSORT checklist, and asked Claude code to do a review using the checklist.
I was really impressed with how it parsed the structured checklist. I was not at all impressed by how it digested the paper. Lots of disguised errors.