logoalt Hacker News

macNchzyesterday at 3:06 PM2 repliesview on HN

Opus 4.5 to 4.6 was pretty incremental, I didn't see much of a difference.

The big coding model moments in recent recollection, IMO, were something like:

- Sonnet 3.5 update in October 2024: ability to generate actually-working code using context from a codebase became genuinely feasible.

- Claude 4 release in May 2025: big tool calling improvements meant that agentic editors like Claude Code could operate on a noticeably longer leash without falling apart.

- Gemini 3 Pro, Claude 4.5, GPT 5.2 in Nov/Dec 2025: with some caveats these were a pretty major jump in the difficulty and scale of tasks that coding assistants are able to handle, working on much more complex projects over longer time scales without supervision, and testing their own work effectively.


Replies

dansoyesterday at 3:47 PM

Maybe they're like me, who didn't spend a lot of time investigating Claude until 4.6 launched and the hype was enough to be the tipping point to invest energy. I do know that I've been having good/great results with Opus 4.6 and the CLI, but after an hour or so, it'll suddenly forget that the codebase has tab-formatted files and burn up my quota trying to figure out how to read text files. And apparently this snafu has been around since at least late last year [0]. Again, I can't complain about the overall speed and quality for my relatively light projects, I'm just fascinated by people who say their agents can get through a whole weekend without supervision, when even 4.6 appears to randomly get tripped up in a very rookie way?

[0] https://github.com/anthropics/claude-code/issues/11447

show 1 reply
wongarsuyesterday at 4:36 PM

This is also supported by the Opus degradation tracker [1]. The dotted line is when they switched from Opus 4.5 to 4.6. There's no difference on statistically significant difference the tested benchmark.

1: https://marginlab.ai/trackers/claude-code-historical-perform...