I have wondered if with these tests it'll reach a point where online models cheat by generating a line art raster reference then behind the scenes deciding how to vectorize it in the most minimalist way (eg: using strokes and shape elements, etc, rather than naively using path outlines for all forms).
Is that cheating, or is that just working smarter not harder?
This Deep Think one was so good that I did get suspicious that maybe it was at least rendering the SVG to an image and then "looking" at the image and tweaking it over a few iterations.
But the reasoning trace doesn't hint at that and looks legit to me: https://gist.github.com/simonw/7e317ebb5cf8e75b2fcec4d0694a8...
I also asked Deep Think what tools it has access to and it has Python and Bash but no internet access, and as far as I can tell that environment doesn't have any libraries or tools installed that can render an SVG to an image format that it could view.