logoalt Hacker News

epolanskitoday at 9:33 AM0 repliesview on HN

Not only that, but since the release of 5.4 and 5.3 codex I've been running them in parallel and I've been let down by Opus 4.6 with maximum thinking way more than I've been let down with OpenAI models.

In fact I'm more and more inclined to run my own benchmarks from now on, because I seriously distrust those I see online.

Even if the benchmarks are indeed valid, they just don't reflect my use cases, usages and ability to navigate my projects and my dependencies.