logoalt Hacker News

jubilantiyesterday at 6:16 PM5 repliesview on HN

I wonder when pelican riding a bicycle will be useless as an evaluation task. The point was that it was something weird nobody had ever really thought about before, not in the benchmarks or even something a team would run internally. But now I'd bet internally this is one of the new Shirley Cards.


Replies

SwellJoeyesterday at 10:03 PM

Pelicanmaxxing

ameliusyesterday at 7:59 PM

Yeah try it with something else, or e.g. add a tiger to the back seat.

rafaelmnyesterday at 6:41 PM

I mean look at the result where he asked about a unicycle - the model couldn't even keep the spokes inside the wheels - would be rudimentary if it "learned" what it means to draw a bicycle wheel and could transfer that to unicycle.

show 1 reply
MagicMoonlightyesterday at 7:25 PM

They’ll hardcode it in 4.8, just like they do when they need to “fix” other issues