> Claude Code is the best autonomous coding agent. If you look at the terminal-bench@2.0 leader...

isege • today at 7:22 AM • 2 replies • view on HN

> Claude Code is the best autonomous coding agent.

If you look at the [email protected] leaderboard, you'll quickly see it's actually one of the weakest agentic harnesses. Anthropic's own models score lower with Claude Code than with virtually any other harness.

So it's quite the opposite. Claude Code is arguably the worst harness to run models with.

Replies

DaanDL • today at 7:37 AM

Okay, but not all results on there are valid, ForgeCode for instance has been cheating in the past:

https://debugml.github.io/cheating-agents/#sneaking-the-answ...

cpursley • today at 9:29 AM

Those benches are completely and totally meaningless when it comes down to real world work tasks, and everyone knows it.

alt Hacker News

Replies