Software engineering is software engineering.
An ace software engineer is not an ace because of tooling.
It's not the plane, it's the pilot, or something like that.
The current state of the technology is that you must read at least some of the code, but everyone keeps shipping tools that are focussed on churning out more and more stuff without giving you any affordances to really understand the output.
Claude Code in particular seems really uninterested in this aspect of the problem and I've stopped using entirely because of this.
Correct me if I’m wrong Simon, but weren’t you highly optimistic about llm’s and agentic-use of them?
I believe this is a common fault of not being able to zoom out and look at what trade offs are being made. There’s always trade-offs, the question is whether you can define them and then do the analysis to determine whether the result leaves you in a net benefit state.
Totally agree. The sales pitch is that anyone can use this stuff, but good output is only obtained via thorough understanding.
Never really bought that there was a clean distinction.
To me it’s a spectrum with varying levels of structure provided, review etc.
Basically oneshot vibes on one side, fully hand coded on other.
Still thinking about LLM's
No offense, but if feels to me the author writes this piece to convince himself. I am afraid he is right. But the bottom line is the same: vibe coding, agenting engineering, everything AI-related comes for our jobs.
the discourse around "code quality" has always attracted the least nuanced minds, ones who see the world and the phenomenon of life as nothing but territory to be divided up by the latest buzzwords. the worst ones insist that we narrow the discussion even further, to focus on the conflicts between these buzzwords. whenever i have to sit through such discussions, i try to meditate on the irony of mother nature weaving the most functionally brutal, ruthlessly redundant poetry that is the genetic code, only for the resulting creatures to deny themselves the power of the principles inherent in their own construction.
As agents get better at code we trust them to produce more of it. There are still bugs to find, but the haystack gets bigger.
So the number of bugs to find remains constant but the amount of code to review scales with the capability of the agent.
An AI cannot be held accountable to mistakes, so an AI should not be doing your job for you. End of discussion.
I agree to some extent. I think that small aps, dashboards, service wrappers etc. you can vibe code.
But building software still requires domain knowledge, understanding data structures, architecture, which services to use. We probably have 2-5 years before thats fully automated.
I am experimenting with writing en entire TypeScript compiler[1] with AI assistant. I've spent 4 months on it already. It might not be successful at the end of the day but my thinking is that if LLMs are going to write a lot of the code I better learn how this can and can not work. I've learned a lot from this project already. I think we're still in charge of design and big ideas even if all of the code is written by AI
I can't really say I agree with this, although I also hate the phrase "agentic engineering".
I'm working on a licensing system for a product I'm building. I've used Claude a little bit to help out with it, but it's also made a lot of very dumb decisions that would have large (security!) consequences if I didn't catch them. And a lot of them are braindead things, like I asked it to create a configurable limit on a certain resource for the trial version of the application. When I said configurable, I mostly meant: put the number in a constant so I can update it later. What Claude thought I asked was "make it so the user can modify the limits of the trial version in the settings panel" (which defeats the entire purpose of a free trial!). Another thing it messed up recently is I was setting up email-magic-link authentication. It defaulted to creating an account for anyone that typed in an email, which could allow a bad actor to both spam people with login requests (probably getting me kicked off Resend) or creating a lot of bogus accounts.
These things do not think. You cannnot outsource your thinking to them.
The problem with vibe coding closer is that the agentic makes a very plasticy samey feel unless you work with something that makes it unique or can pass a template through it.
Why is it one or the other and not one THEN the other?
What the F is "agentic" really?
"But I’m not reviewing that code. And now I’ve got that feeling of guilt: if I haven’t reviewed the code, is it really responsible for me to use this in production?"
"I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it’s just going to do it right. It’s not going to mess that up. You have it add automated tests, you have it add documentation, you know it’s going to be good."
This really is Wordpress and early PHP all over again, but it's the seasoned folks rather than the amateurs that buy into it.
I believe these tools will be refined and locked down and eventually turn into RAD stuff used by certified enterprise consultants, much like SAP and Salesforce and IBM solutions and so on. From this I come to the conclusion that it is not a good idea to become dependent on them at this stage, which is corroborated by the pecuniary expense as well as excruciatingly fast change in available products.
Agentic engineering? That reads to me a little like amateur oncologist. How are you defining engineering?
Can agentic engineers adhere to a similar code of ethics that a professional engineer is sworn to uphold?
https://www.nspe.org/career-growth/nspe-code-ethics-engineer...
I mean... yeah? Isn't it obvious that they're essentially the same thing, but one thinks they're in a higher class than the other?
Fast feedback loops and delegating tasks to sub-agents have been pretty common for vibers since well before they were canonicalized by agenteers. Same thing, different day, hardly even any difference in quality: they evolve together, though vibe tends to lead and agents follow and refine... which vibers then use too.
If you think of vibe coders as agentic alpha testers it makes a lot more sense.
I think this is what people mean when they say LLMs are a higher level abstraction. We still need to consider edge cases and have tests. We still to sweat the architecture and understand how the pieces fit together and have a mental map of the codebase. But within each bottom node of that architecture we don't sweat the details. Anything obvious gets caught right away. Most subtle/interaction-based issues occur at the architecture level. Anything that bypasses those filters is a weird bug that is no worse or different from a normal bug fixes - an edge case that was hit in a real world scenario that gets flagged by a user or a logged as an error.
There are certain codebases and pieces of code we definitely want every line to be reasoned and understood. But like his API endpoint example, no reason to fuss with the boilerplate.
This has definitely been my shift over the past few months, and the advantage is I can spend much more time and energy on getting the code architecture just right, which automatically prevents most of the subtle bugs that has people wringing their hands. The new bar is architecting code to be defined as well as an API endpoint->service structure so you can rely on LLMs to paint by numbers for new features/logic.
> But I’m not reviewing that code (...)
That's the spirit, I always say - _others_ will deal with AI slop during code review. Eventually they will get tired and start 'reviewing' this AI stuff with AI - so it's a win win. Right?
> I’m starting to treat the agents in the same way. And it still feels uncomfortable, because human beings are accountable for what they do. A team can build a reputation. I can say “I trust that team over there. They built good software in the past. They’re not going to build something rubbish because that affects their professional reputations.”
The most important part and why slop isn't the same as a code written by someone else. The model doesn't care, it just produces whatever it is asked to produce. It doesn't have pride, it doesn't have ego, it doesn't artisanal qualities, it doesn't have ownership.
Every time I do deep work, and think of solutions to a complex problem. I always have the opportunity to ask claude to implement a sub-par AI slop solution.
Do this enough times, and I will have forgotten how to think.
People in the future are going to wonder what the hell we were thinking, when 30 years down the line everything is a hot mess of billions of lines of code generated by LLMs that no human has read almost any of it and is no longer possible for anyone to maintain neither with nor without LLMs. And the LLM generated garbage will have drowned out all of the good quality code that ever existed and no one will be able to find even human generated code anymore on the internet.
Makes me want to just give up programming forever and never use a computer again.
I feel like an outlier in all of this. But isn't this just more AI slop? How is this different from text generation or image generation?
Like many people I have used AI to generate crap I really don't care about. I need an image. Generate something like, whatever. Great hey a good looking image! No that's done I can do something I find more interesting to do.
But it's slop. The image does not fit the context. Its just off. And you can tell that no one really cared.
This isn't good.
> It used to be if you found a GitHub repository with a hundred commits and a good readme and automated tests and stuff, you could be pretty sure that the person writing that had put a lot of care and attention into that project.
I think this highlights a problem that has always existed under the surface, but it's being brought into the light by proliferation of vibeslop and openclaw and their ilk. Even in the beforetimes you could craft a 100.0% pure, correct looking github repo that had never stood the test of production. Even if you had a test suite that covers every branch and every instruction, without putting the code in production you aren't going to uncover all the things your test suite didn't--performance issues, security issues, unexpected user behavior, etc.
As an observer looking at this repo, I have no way to tell. It's got hundreds of tests, hundreds of commits, dozens of stars... how am I to know nobody has ever actually used it for anything?
I don't know how to solve this problem, but it seems like there's a pretty obvious tooling gap here. A very similar problem is something like "contributor reputation", i.e. the plague of drive-by AI generated PRs from people (or openclaws) you've never seen before. Stars and number of commits aren't good enough, we need more.
I still don’t get what agentic engineering is. Isn’t it all just asking the same LLM what you want it to do?
huh. i honestly never thought they were all that different. didn't the same guy coin them both to refer to the same thing?
> I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it’s just going to do it right. It’s not going to mess that up.
> Claude Code does not have a professional reputation!
how come?
Reminder, cybersecurity will be huge in following years.
Companies are shipping things and nobody understands what they're shipping.
I agree, I'm actually generating just over of 20,000 lines of code each day at my company. Part of that was the mandate and leaderboards around token usage, but also they started using pull requests as an explicit metric. What I do is usually pull around 5 or so tickets at once, spin up 5 different agents on their own branch, have them work until completion, and then spin up two more agents to handle the merge request.
I'm not checking the code since the code doesn't really matter anymore anyways - I just have the agent write passing tests for the changes or additions I make, and so even if something breaks I can just point to the tests.
Some days, the tickets are completed much faster than I expect and I don't hit my daily token expenditure goal, so I have my own custom harness that actually hooks up an agent to TikTok, basically it splits up the reel into 1 second increments and then feeds those frames to the LLM for it's own consumption. I can easily burn 10m tokens a day on this, and Claude seems to enjoy it.
Personally I want to thank you Simon for putting me onto this "vibe engineering" concept, I really didn't expect an archaeology major like myself to become a real engineer but thanks to AI now I can be! Truly gatekeeping in tech is now dead.
For work I do agentic engineering. As the code that I submit for a code review is hand reviewed by me. I know every line and file that I submit.
My side project is 80% vibe code. Every now and then I look and see all the bad stuff, then I scold Codex a bit and it refactors it for me. So I do see the author's point.
Instead of "vibe coding" by asking the AI to design and write code, I'm having it refine my own designs, and write code under strict supervision and guidance, that I carefully review and iterate on.
I took a rock carving course in school that really enlightened me about software engineering, and it still applies today, especially to AI. You can't just decide what you want to carve, hold the chisel in just the right spot, and whack it with a hammer just perfectly so all the rock you want falls away leaving a perfect statue behind.
"I saw the angel in the marble and carved until I set him free." -Michelangelo
It's a long drawn out iterative process of making millions of tiny little chips, and letting the statue inside find its way out, in its natural form, instead of trying to impose a pre-determined form onto it.
Vibe coding is hoping your first whack of the hammer is going to make a good statue, then not even looking at the statue before shipping it!
But AI assisted conscientious coding (or agentic engineering as Simon calls it) is the opposite of that, where you chip away quickly and relentlessly, but you still have to carefully control where you chisel and what you carve away, and have an idea in your mind what you want before you start.
> I know full well that if you ask Claude Code to build a JSON API endpoint that runs a SQL query and outputs the results as JSON, it’s just going to do it right. It’s not going to mess that up. You have it add automated tests, you have it add documentation, you know it’s going to be good.
> But I’m not reviewing that code. And now I’ve got that feeling of guilt: if I haven’t reviewed the code, is it really responsible for me to use this in production?
Answer: it wholly depends upon what management has dictated be the goal for GenAI use at the time.
There seems to be a trend of people outside of engineering organizations thinking that the "iron triangle" of software (and really, all) engineering no longer holds. Fast, cheap, good: now we can pick all three, and there's no limit to the first one in particular. They don't see why you can't crank out 10x productivity. They've been financially incentivized to think that way, and really, they can't lose if they look at it from an "engineer headcount" standpoint. The outcomes are:
1) The GenAI-augmented engineer cranks out 10x productivity without any quality consequences down the line, and keeps them from having to pay other people
or
2) The GenAI-augmented engineer cranks out 10x productivity with quality consequences down the line, at which point the engineer has given another exhibit in the case as to why they should no longer be employed at that organization. Let the lawyers and market inertia deal with the big issues that exist beyond the 90-day fiscal reporting period.
Either way, they have a route to the destination of not paying engineers, and that's the end goal.
If you don't like that way of running a software engineering organization, well, you're not alone, but if nothing else, you could use GenAI to make working for yourself less risky.
Simon,
Just piggy backing on this post since I'm early:
Would love to see your take on how the AI and Django worlds will collide.
Honestly, I think the need for devs is total copium, the progress made in two years is astounding and in two years time they will be better at programming than 99% of programmers. It’s incredible what they can do now. No it’s not perfect but imagine where we’ll be in 5 or 10 years.
"Code quality" was always a mirage imo. Logic is what matters. I've used the internet from the early days, and probably 99% of software I used always had serious bugs. Ultima online was mentioned in HN recently: it was a real bug-and-exploit-fest. Banks, AAA games, companies like Uber with 1000's of engineers - they all had serious problems (and that's still true). It would be worst if some engineers didn't have that drive to code in high quality, but we gotta admit that was not ever enough. Even now with Claude Code, I see a lot of "specifications" that are far from specified enough - and people blame the LLM.
man i love this post
I'd be lying if I said I was not worried about the future. I am not necessarily worried in the sense that there will be some grave, impeding doom that awaits the future of humanity.
Rather, I just feel like I have to constantly remind myself of the impermanence of all things. Like snow, from water come to water gone.
Perhaps I put too much of my identity in being a programmer. Sure, LLMs cannot replace most us in their current state, but what about 5 years, 10 years, ..., 50 years from now? I just cannot help be feel a sense of nihilism and existential dread.
Some might argue that we will always be needed, but I am not certain I want to be needed in such a way. Of course, no one is taking hand-coding away from me. I can hand-code all I want on my own time, but occupationally that may be difficult in the future. I have rambled enough, but all and all, I do not think I want to participate in this society anymore, but I do not know how to escape it either.
I grew up on construction sites with my dad. If i've done well in my career, it was from watching him operate - managing huge construction crews, how he figured out who to put on what tasks, handling suprises, setbacks, all that stuff
My dad (now retired) was always super practical about stuff. He'd tell me pretty nonchalantly things like "yeah we're dealing with xyz constraint, we may have to cut a corner over here, but that's ok", when I asked him about it he gave me a little spiel that you can be thoughtful about how you do things, including when you can cut a corner and more importantly, what corners are ok to cut.
I really took that to heart - especially the "be thoughtful about the corners you cut"
If an LLM has consistently one shotted certain tasks and they are rote/mechanical - not reviewing that code is probably ok.
Are you getting lazy and not reviewing stuff that should be reviewed even if a human wrote it? That's probably not ok
I can live with some basic code that broke because it used outdated syntax somewhere (provided the code isn't part of a mission critical application), but I can't live with it fucking JWT signing etc
[flagged]
[dead]
[flagged]
[flagged]
[flagged]
[dead]
[flagged]
> my disturbing realization that vibe coding and agentic engineering have started to converge in my own work.
>I firmly staked out my belief that “vibe coding” is a very different beast from responsible use of AI to write code, which I’ve since started to call agentic engineering
Disturbing? Really? I admit I don't do agentic and am going only by vibes, but for me agentic engineering is basically vibe coding in a automated loop with some ornamentals. They both stem from the same LLM root and positioning them as significantly different is weird and unconvincing to me. There may be a merit to this article (I gave up after few sentences), but I reject this specific premise.