Simon, if you're reading this, I'd be really curious to hear your thoughts on how to effectively conduct code reviews in a world where "code is cheap".
One of the biggest struggles I have on my team is coworkers straight up vibing parts of the code and not understanding or guiding the architecture of subsystems. Or at least, not writing code in a way that is meant to be understood by others.
Then when I go through the code and provide extensive feedback (mostly architectural and highlighting odd inconsistencies with the code additions) I'm met with much pushback because "it works, why change it"? Not to mention the sheer size of prs ballooning in recent months.
The end result is me being the bottleneck because I can't keep up with the "pace" of code being generated, and feeling a lot of discomfort and pressure to lower my standards.
I've thought about using a code review agent to review and act as me in proxy, but not being able to control the exact output worries me. And I don't like the lack of human touch it provides. Maybe someone has advice on a humane way to handle this problem.
Agent based code reviews is what you want. But you have to do set it up with really good context about what is wanted. You then review the reviews, keep improving the context it is working with. Make sure it's put into everyone's global context they work with as well.
Weirdly this article doesn't really talk about the main agentic pattern
- Plan (really important to start with a plan before code changes). iteratively build a plan to implement something. You can also have a colelctive review of the plan, make sure its what you want and there is guidance about how it should implement in terms of architecture (should also be pulling on pre existing context about your architecure /ccoding standards), what testing should be built. Make sure the agent reviews the plan, ask the agent to make suggestions and ask questions
- Execute. Make the agent (or multiple agents) execute on the plan
- Test / Fix cycle
- Code Review / Refactor
- Generate Test Guidance for QA
Then your deliverables are Code / Feature context documentation / Test Guidance + evolving your global/project context
"It works, why change it?" is a horrible attitude but is an organizational and interpersonal problem, not a technical one. They're only 1/3 of the way done according to Kent Beck.¹
There are plenty of orgs using AI who still care about architecture and having easily human-readable, human-maintainable code. Maybe that's becoming an anachronism, and those firms will go the way of the Brontosaurus. Maybe it will be a competitive advantage. Who knows?
¹ "Make it work, make it right, make it fast."
The way I am handing this - investing heavily in static and dynamic analysis aspects.
- A lot more linting rules than ever before, also custom rule sets that do more org and project level validations.
- Harder types enforcement in type optional languages , Stronger and deeper typing in all of them .
- beyond unit tests - test quality coverage tooling like mutation testing(stryker) and property based testing (quickcheck) if you can go that precise
- much more dx scripts and build harnesses that are specific to org and repo practices that usually junior/new devs learn over time
- On dynamic side , per pull requests environments with e2e tests that agents can validate against and iterate when things don’t work.
- documentation generation and skill curation. After doing a batch of pull requests reviews I will spend time in seeing where the gaps are in repo skills and agents.
All this becomes pre-commit heavy, and laptops cannot keep up in monorepos, so we ended up doing more remote containers on beefy machines and investing and also task caching (nx/turborepo have this )
Reviews (agentic or human) have their uses , but doing this with reviews is just high latency, inefficient and also tends to miss things and we become the bottleneck.
Earlier the coder(human or agent) gets repeatable consistent feedback it is better
> I'm met with much pushback because "it works, why change it"?
This is an educational problem, and is unlikely to be easy to fix in your team (though I might be wrong). I would suggest to change to a team or company with a culture that values being able to reason about one’s software.
We make the creator of the PR responsible for the code. Meaning they must understand it.
Also, we only allow engineers to commit (agent generated) code. Designers just come up with suggestions, engineers take it and ensure it fits our architecture.
We do have a huge codebase. We are teaching Claude Code with CLAUDE.md's and now also <feature>.spec.md (often a summary of the implementation plan).
In the end, engineers are responsible.
Can you document the hard architectural requirements of your codebase? And keep it up to date? If you can do that, you can force your coworkers to always use those requirements during their prompting /planning for their implementations and you can feed that to an agent and have that review the code.
But more proactively, if people aren't going to write their own code, I think there needs to be a review process around their prompts, before they generate any code at all. Make this a formal process, generate the task list you're going to feed to your LLM, write a spec, and that should be reviewed. This is not a substitute for code reviews, but it tends to ensure that there are only nitpick issues left, not major violations of how the system is intended to be architected.
I’m running into this problem as well with juniors slinging code that takes me a very long time to understand and review. I’m iterating on an AGENTS.md file to share with them because they aren’t going to stop using AI and I’m a little tied of always saying the same things (Claude loves to mock everything and assert that spies were called X times with Y arguments which is a great recipe for brittle tests, for example)
I know they won’t stop using AI so giving them a directives file that I’ve tried out might at least increase the quality of the output and lower my reviewing burden.
Open to other ideas too :)
How are the architecture changes you are proposing improving the end result?
>but not being able to control the exact output worries me
Why?
Code review should be mandatory and reviewers should ask big PRs to be broken up, and its submitters to be able to defend every line of code. For when the computer is generating the code, the most important duty of the submitter is to vouch for it. To do otherwise creates the bad incentive of making others do all your QA, and nobody is going to be rewarded for that.
Code review is now a bit like Brandolini's law: "The amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it." You ultimately need a lot of buy in to spend more than 5 mins on something that took 5 seconds to produce.
This is genuinely one of the most interesting questions right now. I don't have solid answers yet, and I'm very keen to learn what people are finding works.
If you accelerate the pace of code creation it inevitably creates bottlenecks elsewhere. Code review is by far the biggest of those right now.
There may be an argument for leaning less on code review. When code is expensive to produce and is likely to stay in production for many years it's obviously important to review it very carefully. If code is cheap and can be inexpensively replaced maybe we can lower our review standards?
But I don't want to lower my standards! I want the code I'm producing with coding agents to be better than the code I would produce without them.
There are some aspects of code review that you cannot skimp on. Things like coding standards may not matter as much, but security review will never be optional.
I've recently been wondering what we can learn from security teams at large companies. Once you have dozens or hundreds of teams shipping features at the same time - teams with varying levels of experience - you can no longer trust those teams not to make mistakes. I expect that the same strategies used by security teams at Facebook/Google-scale organizations could now be relevant to smaller organizations where coding agents are responsible for increasing amounts of code.
Generally though I think this is very much an unsolved problem. I hope to document the effective patterns for this as they emerge.