logoalt Hacker News

devonkelleytoday at 3:31 AM0 repliesview on HN

25% regression rate on the best model is the number people should be sitting with here. That means 1 in 4 commits from your agent is breaking something that used to work. In any human team that would get you a serious conversation. We keep benchmarking agents like they're taking a test but the actual failure mode in production is slow accumulation of regressions nobody catches until the whole thing is on fire.