250K lines in a month — okay, but what does review actually look like at that volume?
I've been poking at security issues in AI-generated repos and it's the same thing: more generation means less review. Not just logic — checking what's in your .env, whether API routes have auth middleware, whether debug endpoints made it to prod.
You can move that fast. But "review" means something different now. Humans make human mistakes. AI writes clean-looking code that ships with hardcoded credentials because some template had them and nobody caught it.
All these frameworks are racing to generate faster. Nobody's solving the verification side at that speed.
I've been trying to beat this drum for a minute now. Your code quality is a function of validation time, and you have a finite amount of that which isn't increased by better orchestration.
My rant about this: https://sibylline.dev/articles/2026-01-27-stop-orchestrating...
I agree with this to some degree. Agents often stub and take shortcuts during implementation. I've been working on this problem a little bit with open-artisan which I published yesterday (https://github.com/yehudacohen/open-artisan).
Rather than having agents decide to manage their own code lifecycle, define a state machine where code moves from agent to agent and isolated agents critique each others code until the code produced is excellent quality.
This is still a bit of an token hungry solution, but it seems to be working reasonably well so far and I'm actively refining it as I build.
Not going to give you formal verification, but might be worth looking into strategies like this.
I have been ~obsessed~ with exactly this problem lately.
We built AI code generation tools, and suddenly the bottleneck became code review. People built AI code reviewers, but none of the ones I've tried are all that useful - usually, by the time the code hits a PR, the issues are so large that an AI reviewer is too late.
I think the solution is to push review closer to the point of code generation, catch any issues early, and course-correct appropriately, rather than waiting until an entire change has been vibe-coded.
You can AI to audit and review. You can put constraints that credentials should never hit disk. In my case, AI uses sed to read my env files, so the credentials don't even show up in the chat.
Things have changed quite a bit. I hope you give GSD a try yourself.
Code is a cost. It seems everyone's forgotten.
Saying "I generated 250k lines" is like saying "I used 2500 gallons of gas". Cool, nice expense, but where did you get? Because it it's three miles, you're just burning money.
250k lines is roughly SQLite or Redis in project size. Do you have SQLite-maintaining money? Did you get as far as Redis did in outcomes?