Here's a good thread over 1+ month, as each model comes out
https://bsky.app/profile/pekka.bsky.social/post/3meokmizvt22...
tl;dr - Pekka says Arc-AGI-2 is now toast as a benchmark
If you look at the problem space it is easy to see why it's toast, maybe there's intelligence in there, but hardly general.
If you look at the problem space it is easy to see why it's toast, maybe there's intelligence in there, but hardly general.