Any benchmarks? | alt Hacker News

nubg • yesterday at 8:02 PM • 1 reply • view on HN

Any benchmarks?

Replies

The main frontier models are all up on https://arcprize.org/tasks

Barely any of them break 0% on any of the demo tasks, with Claude Opus 4.6 coming out on top with a few <3% scores, Gemini 3.1 Pro getting two nonzero scores, and the others (GPT-5.4 and Grok 4.20) getting all 0%

➕ show 2 replies