logoalt Hacker News

culiyesterday at 9:15 PM0 repliesview on HN

Look at the ARC site. The scores of these models is plotted against their "cost per task". All of these huge jumps come along with massive increases in cost per task. Including Gemini 3.1 Pro which increased by 4.2x