what is the evidence that being able to play games equates to AGI?

dinkblam • yesterday at 8:03 PM • 8 replies • view on HN

Replies

The test doesn't prove you have AGI. It proves you don't have AGI. If your AI can't solve these problems that humans can solve, it can't be AGI.

Once the AIs solve this, there will be another ARC-AGI. And so on until we can't find any more problems that can be solved by humans and not AI. And that's when we'll know we have AGI.

➕ show 3 replies

ACCount37 • yesterday at 8:10 PM

None whatsoever.

It's a "let's find a task humans are decent at, but modern AIs are still very bad at" kind of adversarial benchmark.

The exact coverage of this one is: spatial reasoning across multiple turns, agentic explore/exploit with rule inference and preplanning. Directly targeted against the current generation of LLMs.

furyofantares • yesterday at 8:09 PM

There isn't a strict definition of AGI, there's no way to find evidence for what equates to it, and besides, things like this are meant only as likely necessary conditions.

Anyway, from the article:

> As long as there is a gap between AI and human learning, we do not have AGI.

This seems like a reasonable requirement. Something I think about a lot with vibe coding is that unlike humans, individual models do not get better within a codebase over time, they get worse.

➕ show 1 reply

arscan • yesterday at 8:14 PM

I think the idea is that if they cannot perform any cognitive task that is trivial for humans then we can state they haven’t reached ‘AGI’.

It used to be easy to build these tests. I suspect it’s getting harder and harder.

But if we run out of ideas for tests that are easy for humans but impossible for models, it doesn’t mean none exist. Perhaps that’s when we turn to models to design candidate tests, and have humans be the subjects to try them out ad nauseam until no more are ever uncovered? That sounds like a lovely future…

➕ show 1 reply

observationist • yesterday at 8:26 PM

The evolution of the test has been partly due to the evolution of AI capabilities. To take the most skeptical view, the types of puzzles AI has trouble solving are in the domain of capabilities where AGI might be required in order to solve them.

By updating the tests specifically in areas AI has trouble with, it creates a progressive feedback loop against which AI development can be moved forward. There's no known threshold or well defined capability or particular skill that anyone can point to and say "that! That's AGI!". The best we can do right now is a direction. Solving an ARC-AGI test moves the capabilities of that AI some increment closer to the AGI threshold. There's no good indication as to whether solving a particular test means it's 15% closer to AGI or .000015%.

It's more or less a best effort empiricist approach, since we lack a theory of intelligence that provides useful direction (as opposed to a formalization like AIXI which is way too broad to be useful in the context of developing AGI.)

sva_ • yesterday at 8:06 PM

That is not the claim. It is a necessary condition, but not a sufficient one.

futureshock • yesterday at 8:12 PM

The evidence is that humans are able to win these games. AGI is usually defined as the ability to do any intellectual task about as well as a highly competent human could. The point of these ARC benchmarks is to find tasks that humans can do easily and AI cannot, thus driving a new reasoning competency as companies race each other to beat human performance on the benchmark.

➕ show 1 reply

alt Hacker News

Replies