You (in theory) have more control over the quality of the team you are managing, than the quality of the models you are using.
And the quality of code models puts out is, in general, well below the average output of a professional developer.
It is however much faster, which makes the gambling loop feel better. Buying and holding a stock for a few months doesn't feel the same as playing a slot machine.
You have a lot of control over LLM quality. There is different models available. Even with different effort settings of those models you have different outcomes.
E.g. look at the "SWE-Bench Pro (public)" heading in this page: https://openai.com/index/introducing-gpt-5-4/ , showing reasoning efforts from none to high.
Of course, they don't learn like humans so you can't do the trick of hiring someone less senior but with great potential and then mentor them. Instead it's more of an up front price you have to pay. The top models at the highest settings obviously form a ceiling though.
What theory is that?
My experience is the absolute opposite. I am much more in control of quality with Ai agents.
I am never letting junior to midlevels into my team again.
In fact, I am not sure I will allow any form of manual programming in a year or so.
One difference is those developers are moral subjects who feel bad if they screw up whereas a computer is not a moral subject and can never be held accountable.
https://simonwillison.net/2025/Feb/3/a-computer-can-never-be...