logoalt Hacker News

jononortoday at 5:16 AM0 repliesview on HN

This seems like a viable eval strategy. Presumably finding a bug requires some degree of understanding of the code, beyond just information retrieval. However it probably does not measure things like prompt adherence or ability to create code that implements a specification?