logoalt Hacker News

supern0vayesterday at 5:22 PM2 repliesview on HN

Interestingly enough, 4.7 actually did regress on a few benchmarks from 4.6, so it's more than just vibes.


Replies

gAIyesterday at 5:27 PM

It seems like a lot of things fed into that. Anthropic couldn't keep up with the compute costs when they got a huge influx of users. (So) effort level defaults got turned down. (Looks like we have direct effort control in the web interface now - thrilled about that!) Adaptive Thinking, while usually cheaper for them, seems less robust than Extended Thinking. And this part is just vibes, but the alignment on 4.7 feels too stiff. I understand wanting the model to push back more, but it seems like 4.7 will push back reflexively in situations where it's just odd.

show 1 reply
ACCount37yesterday at 5:30 PM

4.7 is a different base model from 4.6, so it's possible that they introduced regressions with pre-training changes, or undercooked the post-training stage.

show 1 reply