logoalt Hacker News

hackernewdsyesterday at 5:04 PM0 repliesview on HN

One would believe a model scoring this high on SWEBench could maximize F1 score for a precision recall problem easily. What's the missing part?