I just don’t see it coming. I was full on that camp 3 months ago, but I just realize every step makes more mistakes. It leads into a deadlock and when no human has the mental model anymore.
Don’t you guys have hard business problems where AI just cant solve it or just very slowly and it’s presenting you 17 ideas till it found the right one. I’m using the most expensive models.
I think the nature of AI might block that progress and I think some companies woke up and other will wake up later.
The mistake rate is just too high. And every system you implement to reduce that rate has a mistake rate as well and increases complexity and the necessary exploration time.
I think a big bulk of people is of where the early adaptors where in December. AI can implement functional functionality on a good maintained codebase.
But it can’t write maintable code itself. It actually makes you slower, compared to assisted-writing the code, because assisted you are way more on the loop and you can stop a lot of small issues right away. And you fast iterate everything•
I’ve not opened my idea for 1 months and it became hell at a point. I’ve now deleted 30k lines and the amount of issues I’m seeing has been an eye-opening experience.
Unscalable performance issues, verbosity, straight up bugs, escape hatches against my verification layers, quindrupled types.
Now I could monitor the ai output closer, but then again I’m faster writing it myself. Because it’s one task. Ai-assisted typing isn’t slower than my brain is.
Also thinking more about it FAANG pays 300$ per line in production, so what do we really trying to achieve here, speed was never the issue.A great coder writes 10 production lines per day.
Accuracy, architecture etc is the issue. You do that by building good solid fundamental blocks that make features additions easier over time and not slower
I know it’s not your main point, but I’m curious where $300/line comes from. I don’t think I’ve ever seen a dollar amount attached to a line of production code before.
I think this sounds like a true yet short sighted take. Keep in mind these features are immature but they exist to obtain a flywheel and corner the market. I don’t know why but people seem to consistently miss two points and their implications
- performance is continuing to increase incredibly quickly, even if you rightfully don’t trust a particular evaluation. Scaling laws like chinchilla and RL scaling laws (both training and test time)
- coding is a verifiable domain
The second one is most important. Agent quality is NOT limited by human code in the training set, this code is simply used for efficiency: it gets you to a good starting point for RL.
Claiming that things will not reach superhuman performance, INCLUDING all end to end tasks: understanding a vague business objective poorly articulated, architecting a system, building it out, testing it, maintaining it, fixing bugs, adding features, refactoring, etc. is what requires the burden of proof because we literally can predict performance (albeit it has a complicated relationship with benchmarks and real world performance).
Yes definitely, error rates are too high so far for this to be totally trusted end to end but the error rates are improving consistently, and this is what explains the METR time horizon benchmark.
[dead]
Wow so many replies.
I think it goes down in two camps. AI is improving on these issues and people countering.
I don’t know for sure, but to me it seems the last 2 years weren’t necessarily 'intelligence' improvements but post-training improvement and tool connections, also reduced censorship.
I’m know using less AI than ever and I’ve been burning 1000USD/month before Claude Code. I have a couple of really fundamental functions built that help me to solve a big chunk of specific problems I can built a lot on that. Adding functionality became easier not more complicated.
I would think for these business problems that I’m facing AI is less than 30% of the time right. For example deciding on how to setup databases for max efficiency how to write efficient queries. Everything that in the end is really moat to you compared to your vibe coded competitors.
From my personal experience I’ve seen a lot of vibe-cded companies stuck and barely adding nec functionality or features and my guess is that they don’t trust changes anymore.
So even if AI would be as good as a really good coder one thing would still be missing a person that is knowing exactly what is happening.
And I mean okay it might be writing a form real quick. But a modern form needs to do a lot of things and if you have established patterns for all kind of inputs, the implementation is mundane.
It’s like when you learn coding, type it yourself to learn. So if you can’t scale the AI only codebase at one point you have to learn it, and I argue right now most efficient way is to write in it.
And I’m also arguing that it’s really tough to get a software so good that it’s actually an asset on the market vibe-coded only. It seems like its more of a drug for wannapreneurs than it is actually building an asset.
Like it builds you a Netflix clone, but what you see is barely the code you need to write a Netflix competitor.