The whole point of this benchmark is that it asks the model to work in a modality it is not trained ...

throwuxiytayq • last Friday at 7:25 AM • 1 reply • view on HN

The whole point of this benchmark is that it asks the model to work in a modality it is not trained in and does not understand well. The result is largely meaningless. This is just like the people who are endlessly surprised by the fact that a raw LLM does not work with numbers well, or miscounts letters. In short, this test benchmarks the intelligence of the person running it, not of the model.

Replies

cedws • last Sunday at 6:53 PM

The rasterised SVG is just a different representation of the same data. A sufficiently advanced LLM may not need to 'see' the rasterised image to be able to draw a good picture. A human could draw a very basic image through raw SVG just by mentally plotting points.

alt Hacker News

Replies