> Run the same agent n times to increase success rate. Are there benchmarks out there that back...

hmokiguess • today at 6:31 PM • 1 reply • view on HN

> Run the same agent n times to increase success rate.

Are there benchmarks out there that back this claim?

Replies

Yes, this is the pass@k metric from code generation research. Found the relevant paper Evaluating Large Language Models Trained on Code (Chen et al., 2021) which introduced the metric.

➕ show 1 reply

alt Hacker News

Replies