logoalt Hacker News

RA_Fishertoday at 2:08 PM1 replyview on HN

In what ways? LM Arena has Opus 4.7 w/ 1567 -/+ 7 vs. 1505 -/+ 10 from GPT-5.5 Codex in code. I'm currently using both.

Admittedly my recent experience tilts Opus now 4.8, but you and others have my interest piqued re: GPT-5.5 Codex so I'm trying that more now.


Replies

spongebobstoestoday at 3:43 PM

arena is not a good benchmark, it is very susceptible to sycophancy