If you're tired of cross-referencing the cherry-picked benchmarks, here's the geometric m...

goldenarm • yesterday at 6:15 PM • 0 replies • view on HN

If you're tired of cross-referencing the cherry-picked benchmarks, here's the geometric mean of SWE-bench Verified & HLE-tools :

Claude Opus 4.6: 65.5%

GLM-5: 62.6%

GPT-5.2: 60.3%

Gemini 3 Pro: 59.1%

alt Hacker News