Source? The most trusted benchmark right now (deepSWE) scores better or just as well on their minimal harness than when using CC or codex
deepSWE clearly doesn't need complex tool calling?
deepSWE clearly doesn't need complex tool calling?