logoalt Hacker News

gertlabstoday at 7:21 AM1 replyview on HN

Success rate measures the amount of code submissions that played the game/environment without failing (compilation, breaking game rules, violating sandbox, etc.), so it makes sense Python would do better there.

Percentile compares only the submissions that didn't hard-fail. So they are a bit different, and we incorporate them both into the combined score.


Replies

Yokohiiitoday at 2:41 PM

Comparing rust to javascript, the gscore is rather similar in distribution, while python falls off. I don't see why python should be so much worse?

show 1 reply