I've noticed this in my small scale tests. Basically the larger the prompt gets (and it includes all the previously generated code because that's what you want to add features to), the more likely is that the LLM will go off the rails. Or forget the beginning of the context. Or go into a loop.
Now if you're using a lot of separate prompts where you draw from whatever the network was trained on and not from code that's in the prompt, you can get usable stuff out of it. But that won't build you the whole application.