logoalt Hacker News

mi_lktoday at 12:27 PM1 replyview on HN

Cursor: Find me another benchmark where Composer 2.5 is a top 10 frontier coding model


Replies

leerobtoday at 5:06 PM

(I work at Cursor) We score well on Terminal-Bench and SWE-bench Multilingual. DeepSWE, not so great yet, as it's more for very long-horizon tasks. We're planning to include more public benchmarks in our next model release.