They don't work really well even on relatively small things and even with a virtually impractical upfront engineering: https://news.ycombinator.com/item?id=47752626
They just make a lot of mistakes that compound and they don't identify. They currently need to be very closely supervised if you want the codebase to continue to evolve for any significant amount of time. They do work well when you detect their mistakes and tell them to revert.