logoalt Hacker News

PunchTornadoyesterday at 4:23 PM1 replyview on HN

The biggest increase is LiveCodeBench Pro: 2887. The rest are in line with Opus 4.6 or slightly better or slightly worse.


Replies

shmoogyyesterday at 4:39 PM

but is it still terrible at tool calls in actual agentic flows?