My max20 sub is sitting unused since april mostly now, codex with 5.4 (and now 5.5) even with fast mode (= double token costs) is night and day. Opus is doing convincing failures and either forgets half the important details or decides to do "pragmatic" (read: technical debt bandaids or worse) silently and claims success even with everything crashing and burning after the changes. and point out the errors it will make even more messes. Opus works really well for oneshotting greenfield scopes, but iterating on it later or doing complex integrations its just unusable and even harmfully bad.
GPT 5.4+ takes its time and considers even edgecases unprovoked that in fact are correct and saves me subsequent error hunting turns and finally delivers. Plus no "this doesn't look like malware" or "actually wait" thinking loops for minutes over a oneliner script change.
Can I get that max20 if you are not using it?
Most "productive" flow I found was when I had both memberships and let Claude do the "I go yeet your feature" side and Codex do the "WTF bro, that's full of race conditions!" review phase.
But now I just use Codex. Claude is unreliable and leaves data races all over and leaves, as you say, negative conditions unhandled fairly consistently.
My mental model for LLM is I don't expect them to chew gum and walk at the same time. Cleaning code up is a different task from building new functionality.
GLM always feels like it's doing things smarter, until you actually review the code. So you still need the build/prune cycle. That's my experience anyway.