ARC-AGI isn't perfect, but it helps demonstrates the gap. I'm sure all companies optimize ...

gpt5 • today at 7:59 AM • 1 reply • view on HN

ARC-AGI isn't perfect, but it helps demonstrates the gap. I'm sure all companies optimize their models for this benchmark given its dominance.

snemvalts • today at 2:43 PM

What about other benchmarks? Benchmarks where the contents are freely available have become useless for evaluating models.

alt Hacker News