SWE-bench Verified is, at this point, contaminated https://openai.com/index/why-we-no-longer-evaluate-swe-bench...
So it os hard to tell how much of a model gain is due to skill, and how much - overfitting.