Is it? Or is it the task you're trying to do? Opus 4.6 has been staggeringly good for me this last week, both inside Claude Code and through Antigravity until I used up my quota.
Usually, Claude code with Opus checks by itself the right tools to check the docs, for Svelte for example. So what it gives me is usually flawless.
And right now, I have to remind it every time that the MCP exists, and even then it cannot manage to find a routing bug I have with Sveltekit.
Did a lot of Sveltekit with Opus in the past, and I didn't have to think about it, Opus always got it right easily. Until now
I think some of this comes down to undeclared A/B testing. I've had the worst week of interactions I have ever had using Claude Code. The whole week whenever I have a session that isn't failing miserably I seem to get tapped for a session survey but on any that are out and out shitting the bed it never asks. It has felt a little surreal. I'd love to see a product wide stats graph for swearing, I would 100% believe that it is hitting an all time high but maybe I'm just a victim of a bad A/B round.