I wonder when pelican riding a bicycle will be useless as an evaluation task. The point was that it was something weird nobody had ever really thought about before, not in the benchmarks or even something a team would run internally. But now I'd bet internally this is one of the new Shirley Cards.
Pelicanmaxxing
Yeah try it with something else, or e.g. add a tiger to the back seat.
I mean look at the result where he asked about a unicycle - the model couldn't even keep the spokes inside the wheels - would be rudimentary if it "learned" what it means to draw a bicycle wheel and could transfer that to unicycle.
They’ll hardcode it in 4.8, just like they do when they need to “fix” other issues
Simon has an article on this
https://simonwillison.net/2025/Nov/13/training-for-pelicans-...