logoalt Hacker News

gAIyesterday at 5:15 PM8 repliesview on HN

4.7 was the first time I had to resort to using the previous version (4.6) for most use cases. Hoping 4.8 rectifies this.


Replies

ishurand4yesterday at 7:25 PM

They just showed the benchmarks it improved on but it regressed on so much more, such as the MCRR benchmark: "On multi-round coreference/context recall tests (often cited as MRCR or long-text retrieval benchmarks), Opus 4.7 reportedly dropped from roughly 78.3% down to 32.2% compared to Opus 4.6."

merlindruyesterday at 5:19 PM

Same. 4.7 felt like a definite regression

show 2 replies
ruairidhwmyesterday at 10:03 PM

I managed to find that Haiku outperformed Sonnet on some tasks...don't want to blog spam but if anyone is interested: https://www.ruairidh.dev/blog/sonnet-4-6-drops-format-rule-o...

soninktoday at 11:53 AM

Same here - we never bumped to 4.7 in our agentic app. Continue to use 4.6.

petterroeayesterday at 6:01 PM

Same. 4.7 has done some incredibly stupid things.

show 1 reply
rhubarbtreeyesterday at 5:22 PM

Same. So happy when I found that option.

show 1 reply
tanepiperyesterday at 8:40 PM

Yep, until 1st June 4.6 is still x1 on Copilot, but will jump up quite a bit in coat - 4.7 was already highly priced, and the output was frankly terrible.

It still seems trying to build general models is mostly cost prohibitive - the frontier model provider and resellers are repricing in such a way the return on investment is dropping as developers and users become more cautious of burning their limits.

I'm still of the opinion that models like 4.6 don't need to be improved on - rather they need to be better integrated with more domain specific models in agentic flows.

dezsirazvanyesterday at 8:12 PM

same!