How likely this problem is already on the training set by now?
For every combination of animal and vehicle? Very unlikely.
The beauty of this benchmark is that it takes all of two seconds to come up with your own unique one. A seahorse on a unicycle. A platypus flying a glider. A man’o’war piloting a Portuguese man of war. Whatever you want.
You can always ask for a tyrannosaurus driving a tank.
I've heard it posited that the reason the frontier companies are frontier is because they have custom data and evals. This is what I would do too
If anyone trains a model on https://simonwillison.net/tags/pelican-riding-a-bicycle/ they're going to get some VERY weird looking pelicans.