logoalt Hacker News

rcarmoyesterday at 8:06 PM1 replyview on HN

I don't think this is a good "benchmark" anymore. It's probably on everyone's training set by now.


Replies

staticassertionyesterday at 8:20 PM

I think it could still be an interesting benchmark. Like, assuming AI companies are genuinely trying to solve this pelican problem, how well do they solve it? That seems valid, and the assumption here is that the approach they take could generalize, which seems plausible.