I of course cannot say what the future holds, but current frontier models are - in my experience - nowhere near good enough for such autonomy.
Even with other agents reviewing the code, good test coverage, etc., both smaller - and every now and then larger - mistakes make their way through, and the existence of such mistakes in the codebase tend to accellerate even more of them.
It for sure depends on many factors, but I have seen enough to feel confident that we are not there yet.