That's my initial experience, yes. It's hard to compare these things cleanly of course. I went through several new contexts on GPT and it just couldn't get traction -- it became hard to keep it focused on "yes there's clearly a race but what actual persistent state got broken"? It just wanted to change the thread priorities so that the problem didn't occur and kept doubling down on that as the solution. Opus made some missteps too but it responded well to my corrections - 2 or 3 significant ones along the way - and it was prepared to keep digging on my exact goal until it found the real issue.
That's my initial experience, yes. It's hard to compare these things cleanly of course. I went through several new contexts on GPT and it just couldn't get traction -- it became hard to keep it focused on "yes there's clearly a race but what actual persistent state got broken"? It just wanted to change the thread priorities so that the problem didn't occur and kept doubling down on that as the solution. Opus made some missteps too but it responded well to my corrections - 2 or 3 significant ones along the way - and it was prepared to keep digging on my exact goal until it found the real issue.