> A messy codebase is still cheaper to send ten agents through than to staff a team around
People who say that haven't used today's agents enough or haven't looked closely at what they produce. The code they write isn't messy at all. It's more like asking the agent to build a building from floorplans and spec, and it produces everything in the right measurements and right colours and passes all tests. Except then you find out that the walls and beams are made of foam and the art is load-bearing. The entire construction is just wrong, hidden behind a nice exterior. And when you need to add a couple more floors, the agents can't "get through it" and neither can people. The codebase is bricked.
Today's agents are simply not capable enough - without very close and labour-intensive human supervision - to produce code that can last through evolution over any substantial period of time.
Debugging would suffer as well, I assume. There's this old adage that if you write the cleverest code you can, you won't be clever enough to debug it.
There's nothing really stopping agents from writing the cleverest code they can. So my question is, when production goes down, who's debugging it? You don't have 10 days.
> the art is load-bearing
This is beautiful
The problem is, the MBAs running the ship are convinced AI will solve all that with more datacenters. The fact that they talk about gigawatts of compute tells you how delusional they are. Further, the collateral damage this delusion will occur as these models sigmoid their way into agents, and harnesses and expert models and fine tuned derivatives, and cascading manifold intelligent word salad excercises shouldn't be under concerned.
Something is missing in the common test suite if this can occur, right?
A lot of that can be overcome by including the need to be able to put more floors on top as part of the spec. Whether it be humans or agents, people rarely specify that one explicitly but treat it as an assumed bit of knowledge.
It goes the other way quite often with people. How often do you see K8s for small projects?
This just sounds like incomplete specs to me. And poor testing.
They can work really well if you put sufficient upfront engineering into your architecture and it's guardrails, such that agents (nor humans) basically can't produce incorrect code in the codebase. If you just let them rip without that, then they require very heavy baby-sitting. With that, they're a serious force-multiplier.