logoalt Hacker News

jronakyesterday at 7:34 PM1 replyview on HN

Did you look at the ARC AGI 2? Codex might be overfit for terminal bench


Replies

tedsandersyesterday at 7:41 PM

ARC AGI 2 has a training set that model providers can choose to train on, so really wouldn't recommend using it as a general measure of coding ability.

show 3 replies