logoalt Hacker News

XCSmeyesterday at 7:51 PM1 replyview on HN

They mentioned in their release page, that the Claude team noticed memorization of the SWE-bench test, so the test is actually in the training data.

Here: https://www.anthropic.com/news/claude-opus-4-7#:~:text=memor...


Replies

William_BBtoday at 6:44 AM

Good luck arguing with SWE benchmark purists