SWE-bench Verified is, at this point, contaminated

stared • today at 7:17 AM • 0 replies • view on HN

So it os hard to tell how much of a model gain is due to skill, and how much - overfitting.

alt Hacker News