logoalt Hacker News

taffydavidtoday at 6:21 AM1 replyview on HN

> It did fall to the usual “model killers”: the nine-pointed star, Count Rugen, the overcrowded flat Earth.

I'd never heard of text to image model killers so I had a good chuckle at this. Such oddly specific things for us to arrive at as a test method


Replies

vunderbatoday at 1:59 PM

Haha yeah, the site automatically assigns the term to any benchmark that fewer than 25% of the tested models are able to pass.

What’s more surprising to me is that, unlike the “pelican riding a bicycle” whose objectivity has been slightly compromised as newer models have incorporated it into their training data, the arbitrary-point star has been wiping models out ever since the early days of Flux back in 2024.

I personally love the test because it's something that even an elementary school child with no artistic experience at all can do, but state of the art models struggle heavily.