logoalt Hacker News

Filligreetoday at 11:29 AM4 repliesview on HN

Everyone who's used Opus knows it's better than the others in a way that isn't captured by the benchmarks. I would describe it as taste.

Lots of models get really close on benchmarks, but benchmarks only tell us how good they are at solving a defined problem. Opus is far better at solving ill-defined ones.


Replies

torginustoday at 3:55 PM

Dunno, I was using Cursor today and for some reason it decided to swith to GPT 5.3 at some point and I didn't even notice. I was sure that Opus is much better, but who knows?

ACCount37today at 1:44 PM

One of the main edges Anthropic has is that "personality tuning" gap. "Nice to use" is a differentiator when raw performance isn't.

OpenAI can sometimes get an edge over Anthropic in hard narrow STEM tasks. I trust benchmarks over vibes there - and the benchmarks show the teams trading blows release after release. Tracking Claude Code vs OpenAI Codex on SWE-bench Verified feels like watching the back alley knife fight of the AI frontier.

But the vibe of "how easy is that model to interact with" and "how easy it is to get it to do what you want it to" does matter a lot when you are the one doing the interacting. And Opus makes for a damn good daily driver.

cmrdporcupinetoday at 2:27 PM

At this point it's frankly not a fair comparison since DeepSeek 3.2 is now many months old and we're waiting for a newer model which has been rumoured as "any day now" since February. (We'll see).

GLM5, the largest Qwen 3.5 model, and Kimi K2.5 are more fair comparisons, though they are, yes, a bit behind. They're more than capable for routine operations though.

Anyways, I'm back to using Opus & Claude Code after a month on Codex/GPT5.3 and 5.4 and it's frankly a rather obvious downgrade. Anthropic is behind OpenAI at this point on coding models, and there's nothing to say they couldn't fall behind the Chinese models as well.

The moat is very shallow. After the events of the last two weeks there's likely a significant % of international capital very interested in breaching it. I know I would like to see this... Anthropic basically said F U to any non-Americans, and OpenAI is ... yeah.

coldteatoday at 11:36 AM

>Everyone who's used Opus knows it's better than the others in a way that isn't captured by the benchmarks. I would describe it as taste.

Ah, the "trust me bro" advantage. Couldn't it just be brand identity and familiarity?

show 2 replies