logoalt Hacker News

jbellisyesterday at 8:58 PM0 repliesview on HN

M2 was one of the most benchmaxxed models we've seen. Huge gap between SWE-B results and tasks it hasn't been trained on. We'll put 2.5 on the list. https://brokk.ai/power-ranking