logoalt Hacker News

fc417fc802yesterday at 9:30 PM4 repliesview on HN

Those are supposed to be issues? After reading your list my impression of ARC-AGI has gone up rather than down. All of those things seem like the right way to go about this.


Replies

red75primetoday at 5:44 AM

No, those aren't issues. But it's good to know the meaning of those numbers we get. For example, 25% is about the average human level (on this category of problems). 100% is either top human level or superhuman level or the information-theoretically optimal level.

show 1 reply
girvoyesterday at 10:30 PM

Yeah I'm quite surprised as to how all of those are supposed to be considered problems. They all make sense to me if we're trying to judge whether these tools are AGI, no?

show 3 replies
stingraycharlestoday at 9:48 AM

“no harnass at all” might be an issue, though, as these types of benchmarks are often gamified and then models perform great on them without actually being better models.

stonogotoday at 1:43 AM

They are severe problems if your income is tied to LLM hype generation.