4.7 was the first time I had to resort to using the previous version (4.6) for most use cases. Hopi...

gAI • yesterday at 5:15 PM • 8 replies • view on HN

4.7 was the first time I had to resort to using the previous version (4.6) for most use cases. Hoping 4.8 rectifies this.

Replies

ishurand4 • yesterday at 7:25 PM

They just showed the benchmarks it improved on but it regressed on so much more, such as the MCRR benchmark: "On multi-round coreference/context recall tests (often cited as MRCR or long-text retrieval benchmarks), Opus 4.7 reportedly dropped from roughly 78.3% down to 32.2% compared to Opus 4.6."

merlindru • yesterday at 5:19 PM

Same. 4.7 felt like a definite regression

➕ show 2 replies

ruairidhwm • yesterday at 10:03 PM

I managed to find that Haiku outperformed Sonnet on some tasks...don't want to blog spam but if anyone is interested: https://www.ruairidh.dev/blog/sonnet-4-6-drops-format-rule-o...

sonink • today at 11:53 AM

Same here - we never bumped to 4.7 in our agentic app. Continue to use 4.6.

petterroea • yesterday at 6:01 PM

Same. 4.7 has done some incredibly stupid things.

➕ show 1 reply

rhubarbtree • yesterday at 5:22 PM

Same. So happy when I found that option.

➕ show 1 reply

tanepiper • yesterday at 8:40 PM

Yep, until 1st June 4.6 is still x1 on Copilot, but will jump up quite a bit in coat - 4.7 was already highly priced, and the output was frankly terrible.

It still seems trying to build general models is mostly cost prohibitive - the frontier model provider and resellers are repricing in such a way the return on investment is dropping as developers and users become more cautious of burning their limits.

I'm still of the opinion that models like 4.6 don't need to be improved on - rather they need to be better integrated with more domain specific models in agentic flows.

dezsirazvan • yesterday at 8:12 PM

same!

alt Hacker News

Replies