You don't think these errors compound? Generated code has 100's of little decisions. Yes, it "usually" works.
Not in my experience. With a proper TDD framework it does better than most programmers at a company who anecdotally have a bug every 2-3 tasks.
Errors compounding is a meme. In iterated as well as verifiable domains, errors dilute instead of compounding because the llm has repeated chances to notice its failure.
LLM’s: sometimes wrong but never in doubt.