That's impressive. I'm also a bit surprised - I wouldn't have expected it to be train...

fc417fc802 • yesterday at 7:25 AM • 1 reply • view on HN

That's impressive. I'm also a bit surprised - I wouldn't have expected it to be trained much at all on that sort of visual input task. I think I'd be similarly surprised to learn that a frontier model was particularly good at playing retro videogames or actuating a robot for example.

However, if it can't figure out to render the json to a visual on its own does it really qualify as AGI? I'd still say the benchmark is doing its job here. Granted it's not a perfectly even playing field in that case but I think the goal is to test for progress towards AGI as opposed to hosting a fair tournament.

Replies

rfoo • yesterday at 7:36 AM

> However, if it can't figure out to render the json to a visual on its own does it really qualify as AGI? I'd still say the benchmark is doing its job here.

Can you render serialized JSON text blob to a visual with your brain only? The model can't do anything better than this - no harness means no tool at all, no way to e.g. implement a visualizer in whatever programming language and run it.

Why don't human testers receive the same JSON text blob and no visualizers? It's like giving human testers a harness (a playable visualizer), but deliberately cripples it for the model.

➕ show 1 reply

alt Hacker News

Replies