It was supposed to be a joke. But weirdly it turns out there's a correlation between how good a model is and how good it as at my stupid joke benchmark.
I didn't realize quite how strong the correlation was until I put together this talk: https://simonwillison.net/2025/Jun/6/six-months-in-llms/
Always loved this example, what do you think of ASCII art vs SVG?
Since it's not a formal encoding of geometric shapes, it's fundamentally different I guess, but it shares some challenges with the SVG tasks I guess? Correlating phrases/concepts with an encoded visual representation, but without using imagegen, that is.
Do you think that "image encoding" is less useful?
It's a thing I love to try with various models for fun, too.
Talking about illustration-like content, neither text-based ASCII art nor abusing it for rasterization.
The results have been interesting, too, but I guess it's less predictable than SVG.