I'm confused, this doesn't make sense. The target they're iterating on (UI) is the same one whose quality they're assessing, not a different one (source code).
You're suggesting that (a) their UI skills are lacking (based on what? isn't UI exactly what they were iterating on and trying to improve?), and (b) that a real UI expert would've somehow felt the UI they were working on was consistently garbage, despite how many times they iterate on it?
Which means you're saying you don't believe anyone can actually produce high quality (to an expert) output with AI on the same target they're working on, and if they think they are, that just means they don't have a good sense of quality?
Without proper training, what looks good may be trash. I always thought pixel art generated by diffusion models looked damn good. Then I started watching and reading reviews by actual pixel artists, and all they saw was flaws. And it wasn't just nitpicking, it was things that were fundamentally wrong, difficult to fix and would look awful and amateurish and distracting to the player in production.
It's not confusing. It makes sense.