logoalt Hacker News

Stevvolast Thursday at 8:40 PM3 repliesview on HN

The variance is way too high for this test to have any value at all. I ran it 10 times, and each pelican on a bicycle was a better rendition than that, about half of them you could say were perfect.


Replies

golly_nedlast Thursday at 9:40 PM

Compared to the other benchmarks which are much more gameable, I trust PelicanBikeEval way more.

show 2 replies
getnormalitylast Friday at 6:25 AM

Well, the variance is itself interesting.

throwaway102398last Friday at 2:47 AM

[dead]