I agree it seems better at complex work. However, I find that it often tries to make ALL work complex. I had a simple bug fix where I knew exactly what the 1-2 line fix was. GPT 5.4 added like 200 LOC and started refactoring the entire function of the app. Was the refactor possibly an improvement? Maybe, but I needed the fix quick so I stopped it and switched to Claude, which did exactly what I was expecting.
Perfectly mirrors my own experience.