logoalt Hacker News

bermuditoday at 1:38 AM1 replyview on HN

Source? The most trusted benchmark right now (deepSWE) scores better or just as well on their minimal harness than when using CC or codex


Replies

theshrike79today at 6:57 PM

deepSWE clearly doesn't need complex tool calling?