My mental model for LLM is I don't expect them to chew gum and walk at the same time. Cleaning code up is a different task from building new functionality.
GLM always feels like it's doing things smarter, until you actually review the code. So you still need the build/prune cycle. That's my experience anyway.