Interestingly enough, 4.7 actually did regress on a few benchmarks from 4.6, so it's more than ...

supern0va • yesterday at 5:22 PM • 2 replies • view on HN

Interestingly enough, 4.7 actually did regress on a few benchmarks from 4.6, so it's more than just vibes.

Replies

It seems like a lot of things fed into that. Anthropic couldn't keep up with the compute costs when they got a huge influx of users. (So) effort level defaults got turned down. (Looks like we have direct effort control in the web interface now - thrilled about that!) Adaptive Thinking, while usually cheaper for them, seems less robust than Extended Thinking. And this part is just vibes, but the alignment on 4.7 feels too stiff. I understand wanting the model to push back more, but it seems like 4.7 will push back reflexively in situations where it's just odd.

➕ show 1 reply

ACCount37 • yesterday at 5:30 PM

4.7 is a different base model from 4.6, so it's possible that they introduced regressions with pre-training changes, or undercooked the post-training stage.

➕ show 1 reply

alt Hacker News

Replies