> Claude Code is the best autonomous coding agent.
If you look at the [email protected] leaderboard, you'll quickly see it's actually one of the weakest agentic harnesses. Anthropic's own models score lower with Claude Code than with virtually any other harness.
So it's quite the opposite. Claude Code is arguably the worst harness to run models with.
Those benches are completely and totally meaningless when it comes down to real world work tasks, and everyone knows it.
Okay, but not all results on there are valid, ForgeCode for instance has been cheating in the past:
https://debugml.github.io/cheating-agents/#sneaking-the-answ...