This seems like a viable eval strategy. Presumably finding a bug requires some degree of understandi...

jononor • today at 5:16 AM • 0 replies • view on HN

This seems like a viable eval strategy. Presumably finding a bug requires some degree of understanding of the code, beyond just information retrieval. However it probably does not measure things like prompt adherence or ability to create code that implements a specification?

alt Hacker News