> It did fall to the usual “model killers”: the nine-pointed star, Count Rugen, the overcrowded f...

taffydavid • today at 6:21 AM • 1 reply • view on HN

> It did fall to the usual “model killers”: the nine-pointed star, Count Rugen, the overcrowded flat Earth.

I'd never heard of text to image model killers so I had a good chuckle at this. Such oddly specific things for us to arrive at as a test method

Replies

vunderba • today at 1:59 PM

Haha yeah, the site automatically assigns the term to any benchmark that fewer than 25% of the tested models are able to pass.

What’s more surprising to me is that, unlike the “pelican riding a bicycle” whose objectivity has been slightly compromised as newer models have incorporated it into their training data, the arbitrary-point star has been wiping models out ever since the early days of Flux back in 2024.

I personally love the test because it's something that even an elementary school child with no artistic experience at all can do, but state of the art models struggle heavily.

alt Hacker News

Replies