logoalt Hacker News

daveguyyesterday at 3:05 PM1 replyview on HN

Source? I haven't seen anything like that for ARC-AGI performance.

Also, if it makes that big of a difference, then make a renderer for your agent that looks like the web page and have it solve them in the graphical interface and funnel the results to the API. I guarantee you won't get better performance, because the AGI is going to have to "understand" the raw data can be represented as a 2D matrix regardless of whether it gets a 2D matrix of pixels or a 2D matrix of enumeration in JSON. If anything, that makes it a more difficult problem for a AI system that "speaks" in tokens.


Replies

famouswafflesyesterday at 4:25 PM

That score is in the arc technical paper [1]. It's the full benchmark score using this harness [2] (which is just open code with read, grep, bash tools).

This is already a solved benchmark. That's why scoring is so convoluted and a self proclaimed Agent benchmark won't allow basic agent tools. ARC has always been a bit of a nothing burger of a benchmark but this takes the cake.

[1] https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf

[2] https://blog.alexisfox.dev/arcagi3

show 1 reply