Anonymous request-token comparisons from Opus 4.6 and Opus 4.7

439 points • by anabranch • yesterday at 4:05 PM • 445 comments • view on HN

Comments

was shocked to see phone verification roll out like last month as well... yikes

45% is brutal if you're building on top of these models as a bootstrapped founder. The unit economics just don't work anymore at that price point for most indie products.

What I've been doing is running a dual-model setup — use the cheaper/faster model for the heavy lifting where quality variance doesn't matter much, and only route to the expensive one when the output is customer-facing and quality is non-negotiable. Cuts costs significantly without the user noticing any difference.

The real risk is that pricing like this pushes smaller builders toward open models or Chinese labs like Qwen, which I suspect isn't what Anthropic wants long term.

➕ show 4 replies

dackdel • yesterday at 5:26 PM

releases 4.8 and deletes everything else. and now 4.8 costs 500% more than 4.7. i wonder what it would take for people to start using kimi or qwen or other such.

justindotdev • yesterday at 4:56 PM

i think it is quite clear that staying with opus 4.6 is the way to go, on top of the inflation, 4.7 is quite... dumb. i think they have lobotomized this model while they were prioritizing cybersecurity and blocking people from performing potentially harmful security related tasks.

➕ show 2 replies

ai_slop_hater • yesterday at 4:56 PM

Does anyone know what changed in the tokenizer? Does it output multiple tokens for things that were previously one token?

➕ show 1 reply

gverrilla • yesterday at 7:04 PM

Yeah I'm seriously considering dropping my Max subscription, unless they do something in the next few days - something like dropping Sonnet 4.7 cheap and powerful.

varispeed • yesterday at 6:21 PM

I spent one day with Opus 4.7 to fix a bug. It just ran in circles despite having the problem "in front of its eyes" with all supporting data, thorough description of the system, test harness that reproduces the bug etc. While I still believe 4.7 is much "smarter" than GPT-5.4 I decided to give it ago. It was giving me dumb answers and going off the rails. After accusing it many times of being a fraud and doing it on purpose so that I spend more money, it fixed the bug in one shot.

Having a taste of unnerfed Opus 4.6 I think that they have a conflict of interest - if they let models give the right answer first time, person will spend less time with it, spend less money, but if they make model artificially dumber (progressive reasoning if you will), people get frustrated but will spend more money.

It is likely happening because economics doesn't work. Running comparable model at comparable speed for an individual is prohibitively expensive. Now scale that to millions of users - something gotta give.

➕ show 1 reply

DeathArrow • yesterday at 6:18 PM

We (my wallet and I) are pretty happy with GLM 5.1 and MiniMax 2.7.

micromacrofoot • yesterday at 5:18 PM

The latest qwen actually performs a little better for some tasks, in my experience

latest claude still fails the car wash test

➕ show 1 reply

QuadrupleA • yesterday at 6:25 PM

Definitely seems like AI money got tight the last month or two - that the free beer is running out and enshittification has begun.

fny • yesterday at 5:04 PM

I'm going to suggest what's going on here is Hanlon's Razor for models: "Never attribute to malice that which is adequately explained by a model's stupidity."

In my opinion, we've reached some ceiling where more tokens lead only to incremental improvements. A conspiracy seems unlikely given all providers are still competing for customers and a 50% token drives infra costs up dramatically too.

➕ show 1 reply

mvkel • yesterday at 5:21 PM

The cope is real with this model. Needing an instruction manual to learn how to prompt it "properly" is a glaring regression.

The whole magic of (pre-nerfed) 4.6 was how it magically seemed to understand what I wanted, regardless of how perfectly I articulated it.

Now, Anth says that needing to explicitly define instructions are as a "feature"?!

bparsons • yesterday at 5:51 PM

Had a pretty heavy workload yesterday, and never hid the limit on claude code. Perhaps they allowed for more tokens for the launch?

Claude design on the other hand seemed to eat through (its own separate usage limit) very fast. Hit the limit this morning in about 45 mins on a max plan. I assume they are going to end up spinning that product off as a separate service.

therobots927 • yesterday at 4:53 PM

Wow this is pretty spectacular. And with the losses anthro and OAI are running, don’t expect this trend to change. You will get incremental output improvements for a dramatically more expensive subscription plan.

➕ show 3 replies

alekseyrozh • yesterday at 6:33 PM

Is it just me? I don't feel difference between 4.6 and 4.7

chandureddyvari • yesterday at 5:43 PM

[dead]

jeremie_strand • yesterday at 6:28 PM

[dead]

kziad • yesterday at 11:07 PM

[dead]

kuzivaai • yesterday at 7:02 PM

[dead]

matt3210 • yesterday at 5:10 PM