logoalt Hacker News

nikcubyesterday at 10:01 PM0 repliesview on HN

the most cited is terminal bench 2.0, but its also plagued by cheating accusations and benchmaxxing.

somewhat remarkably, claude code ranks last for Opus 4.6 - which may say something about cc, or say something about the benchmark

[0] https://www.tbench.ai/leaderboard/terminal-bench/2.0