Its especially concerning / frustrating because boris’s reply to my bug report on opus being dumber was “we think adaptive thinking isnt working” and then thats the last I heard of it: https://news.ycombinator.com/item?id=47668520
Now disabling adaptive thinking plus increasing effort seem to be what has gotten me back to baseline performance but “our internal evals look good“ is not good enough right now for what many others have corroborated seeing
This matches my experience as well, "adaptive thinking" chooses to not think when it should.
[dead]
[dead]
Seconded. After disabling adaptive thinking and using a default higher thinking, I finally got the quality I'm looking for out of Opus 4.6, and I'm pleased with what I see so far in Opus 4.7.
Whatever their internal evals say about adaptive thinking, they're measuring the wrong thing.