Gpt 5.5 is quite a big leap, it's a lot better than opus 4.7 for agentic coding
Arena only allows very small context sizes, so it's a noisy benchmark for what we care about IRL.
Better in what ways? I'm just curious about your experience.
Arena only allows very small context sizes, so it's a noisy benchmark for what we care about IRL.