the most cited is terminal bench 2.0, but its also plagued by cheating accusations and benchmaxxing....

nikcub • yesterday at 10:01 PM • 0 replies • view on HN

the most cited is terminal bench 2.0, but its also plagued by cheating accusations and benchmaxxing.

somewhat remarkably, claude code ranks last for Opus 4.6 - which may say something about cc, or say something about the benchmark

alt Hacker News