logoalt Hacker News

Bnjorogeyesterday at 5:29 PM1 replyview on HN

For long-running tasks, yes 4.7 has been a noticeable improvement. Goes off the rails alot less than 4.6 does. For shorter-sized windows, I havent felt as much and agree that the harness improvements have been fhe biggest lever


Replies

csvanceyesterday at 7:12 PM

When doing big long running workflows especially with plan Mode 4.7 was a clear improvement. It’s considerably worse for under specified tasks and responds to a couple sentences with 10+ paragraphs for explanatory type discussions.

show 1 reply