logoalt Hacker News

gpt5today at 7:59 AM1 replyview on HN

ARC-AGI isn't perfect, but it helps demonstrates the gap. I'm sure all companies optimize their models for this benchmark given its dominance.


Replies

snemvaltstoday at 2:43 PM

What about other benchmarks? Benchmarks where the contents are freely available have become useless for evaluating models.