I don't think this is a good "benchmark" anymore. It's probably on everyone'...

rcarmo • yesterday at 8:06 PM • 1 reply • view on HN

I don't think this is a good "benchmark" anymore. It's probably on everyone's training set by now.

Replies

staticassertion • yesterday at 8:20 PM

I think it could still be an interesting benchmark. Like, assuming AI companies are genuinely trying to solve this pelican problem, how well do they solve it? That seems valid, and the assumption here is that the approach they take could generalize, which seems plausible.

alt Hacker News

Replies