I had a pretty involved cross module state bug with complex dependencies and also reactivity issues interleaved. I tried fixing it multiple times manually with 4h time box as well as claude models up to opus 4.6 high and codex 5.3 all which failed. When the GPT-Pro model came out i heard it was not supposed to be an everyday coding model but tried anyways as it looked impressive. It took a single 8h run burning 200$ with doing nothing but occasionally waiting for test runs or me writing “continue”. After 8 hours, and fearing i wasted the money, the bug was consistently fixed, not just one edge case that triggered the behavior.
ps the refactoring it did as part of the solution was a bit verbose and had a few abstractions i knew would not be needed and asked it to remove but were solid otherwise.