I disagree, it improved enormously especially at staying consistent for long-tasks, I have a task running for 32 days (400M+ tokens) via Codex and that's only since gpt-5.4
That’s actually crazy, what kind of task is that? And is that a recurring kind of task like some analysis, or coding related?
...what? what kind of a task are you running?
Has that task accomplished anything yet?