logoalt Hacker News

leerobtoday at 5:06 PM0 repliesview on HN

(I work at Cursor) We score well on Terminal-Bench and SWE-bench Multilingual. DeepSWE, not so great yet, as it's more for very long-horizon tasks. We're planning to include more public benchmarks in our next model release.