Don't you see the massive problem with requiring visual input? Are blind people not intelligent because they cannot solve ARC-AGI-3 without a "harness"?
A theoretical text-only superintelligent LLM could prove the Riemann hypothesis but fail ARC-AGI-3 and won't even be AGI according to this benchmark...
Well, it would be AGI if you could connect a camera to it to solve it, similar to how blind people would be able to solve it if you restored their eyesight. But if the lack of vision is a fundamental limitation of their architecture, then it seems more fair not to call them AGI.
Think of it as spatial input, not visual. Blind people do have spatial inputs, and high spatial intelligence.