Not sure if I'm reading this right, but the "success rate" table for OpenAI models sh...

andai • yesterday at 10:16 PM • 1 reply • view on HN

Not sure if I'm reading this right, but the "success rate" table for OpenAI models shows Clojure near the bottom. And if I switch provider to Anthropic, success rate for most languages, including Clojure, goes up dramatically.

Replies

gertlabs • yesterday at 10:28 PM

Success rate includes syntax/compilation failures as well as environment rule violations, and is almost entirely from one-shot code generations. Percentile shows how well the working submissions perform.

In long horizon agentic coding evaluations, strong models fix the syntax and percentile and it becomes a direct comparison of which submissions per language performed the best on average. You can filter for that here: https://gertlabs.com/rankings?provider=openai&mode=agentic_c...

alt Hacker News

Replies