The code itself does not matter. If the tests pass, and the tests are good, then who cares? AI will be maintaining the code.
Tests don't cover everything. Performance? Edge cases? Optimization of resource usage are not tipically covered by tests.
An example: it had a complete interface to a hash map. The task was to delete elements. Instead of using the hash map API, it iterated through the entire underlying array to remove a single entry. The expected solution was O(1), but it implemented O(n). These decisions compound. The software may technically work, but the user experience suffers.
That's assuming no human would ever go near the code, and that over time it's not getting out of hand (inference time, token limits are all a thing), and that anti-patterns don't get to where the code is a logical mess which produces bugs through a webbing of specific behaviors instead of proper architecture.
However I guess that at least some of that can be mitigated by distilling out a system description and then running agents again to refactor the entire thing.
Next iterations of models will have to deal with that code, and it would be harder and harder to fix bugs and introduce features without triggering or introducing more defects.
Biological evolution overcomes this by running thousands and millions of variations in parallel, and letting the more defective ones to crash and die. In software ecosystems, we can't afford such a luxury.