Terminal Bench 2.0 | Name | Score | |---------------------|-------| ...

tosh • yesterday at 6:46 PM • 2 replies • view on HN

Terminal Bench 2.0

  | Name                | Score |
  |---------------------|-------|
  | OpenAI Codex 5.3    | 77.3  |
  | Anthropic Opus 4.6  | 65.4  |

Replies

greenfish6 • yesterday at 6:47 PM

yea but i feel like we are over the hill on benchmaxxing, many times a model has beaten anthropic on a specific bench, but the 'feel' is that it is still not as good at coding

➕ show 3 replies

xyst • yesterday at 9:03 PM

Benchmarks are useless compared to real world performance.

Real world performance for these models is a disappoint.

alt Hacker News

Replies