logoalt Hacker News

6thbityesterday at 8:15 PM2 repliesview on HN

Not clear to me the diff with v2?


Replies

ACCount37yesterday at 8:30 PM

They stacked the deck. If v2 was still rule inference + spatial reasoning, a bit like juiced up Raven's progressive matrices, then v3 adds a whole new multi-turn explore/exploit agentic dimension to it.

Given how hard even pure v2 was for modern LLMs, I'm not surprised to see v3 crush them. But that wouldn't last.

jasonjmcgheeyesterday at 8:32 PM

v2 was a static fill in the blank task instead of v3 which is interactive.

There's world state that you can change. Not just place pixel.

Here's v2:

https://arcprize.org/tasks/ce602527