logoalt Hacker News

sarchertechtoday at 7:23 AM0 repliesview on HN

It has nothing to do with whether small mistakes are allowable or not. It’s about customers needing a consistent product.

The in-code tests and the expectations/assumptions about the product that your users have are wildly different. If you allow agents to make changes restricted only by those tests, they’re going to constantly make changes that break customer workflows and cause noticeable jank.

Right now agents do this at a rate far higher than humans. This is empirically demonstrable by the fact that an agent requires tests to keep from spinning out of control when writing more than a few thousand lines and a human does not. A human is capable of writing tens of thousands as of lines with no tests, using only reason and judgement. An agent is not.

They clearly lack the full capability of human reason, judgment, taste, and agency.

My suspicion is that something close enough to AGI that it can essentially do all white dollar jobs is required to solve this.