[flagged]
It's just Simon Willison (the person you are replying to) who always makes a pelican, as his personal flippant benchmark. It's not that deep.
No benchmark will be perfect, especially if it's public but it's a fun experiment to visually see how these models get better and better.
Why is it so wrong?
Thanks for the "scientific air" remark, that gave me a genuine LOL.
It's just Simon Willison (the person you are replying to) who always makes a pelican, as his personal flippant benchmark. It's not that deep.