logoalt Hacker News

hmokiguesstoday at 6:31 PM1 replyview on HN

> Run the same agent n times to increase success rate.

Are there benchmarks out there that back this claim?


Replies

danoandcotoday at 6:43 PM

Yes, this is the pass@k metric from code generation research. Found the relevant paper Evaluating Large Language Models Trained on Code (Chen et al., 2021) which introduced the metric.

show 1 reply