Is that actually a useful benchmark, or is it just for the laughs? I've never really understood...

JohnKemeny • yesterday at 10:36 PM • 6 replies • view on HN

Is that actually a useful benchmark, or is it just for the laughs? I've never really understood that.

Replies

It was supposed to be a joke. But weirdly it turns out there's a correlation between how good a model is and how good it as at my stupid joke benchmark.

I didn't realize quite how strong the correlation was until I put together this talk: https://simonwillison.net/2025/Jun/6/six-months-in-llms/

➕ show 1 reply

OtherShrezzing • yesterday at 10:54 PM

For me, it shows if LLM are generalising from their training data. LLM understand all of the words in the prompt. they understand the spec for svg better than any human. They know what a bird is. They know what a bike is. They know how to draw (and given access to computer-use could probably ace this test). They can plan and execute on those plans.

Everything here should be trivial for LLM, but they’re quite poor at it because there’s almost no “how to draw complex shapes in svg” type content in their training set.

jerpint • today at 8:20 AM

It’s been useful though given the authors popularity I suspect it’s only a matter of time new LLMs become “more aware” of it

dominicrose • today at 8:32 AM

It's useful because it's SVG so it's different than other image generation methods.

owebmaster • today at 12:47 AM

I think in 5 years we might have some ultra-realistic pelicans and this benchmark will turn out quite interesting.

➕ show 1 reply

mvdtnz • yesterday at 11:10 PM

[flagged]

➕ show 2 replies

alt Hacker News

Replies