> adversarial AI reviewers, runtime tests (also by AI), or something else?
And spec management, change previews, feedback capture at runtime, skill libraries, project scaffolding, task scoping analysis, etc.
Right now this stuff is all rudimentary, DIY, or non-existent. As the more effective ways to use LLMs becomes clearer I expect we'll see far more polished, tightly-integrated tooling built to use LLMs in those ways.
Agents require tests to keep from spinning out of control when writing more than a few thousand lines, but we know that tests are wildly insufficient to describe the state of the actual code.
You are essentially saying that we should develop other methods of capturing the state of the program to prevent unintended changes.
However there’s no reason to believe that these other systems will be any easier to reason about than the code itself. If we had these other methods of ensuring that observerable behavior doesn’t change and they were substantially easier than reasoning about the code directly, they would be very useful for human developers as well.
The fact that we’ve not developed something like this in 75 years of writing programs, says it’s probably not as easy as you’re making it out.