logoalt Hacker News

alexpotatotoday at 1:56 PM13 repliesview on HN

I work as a DevOps/SRE and have been doing it FinTech (bank, hedge funds, startups) and Crypto (L1 chain) for almost 20 years.

My thoughts on vibe coding vs production code:

- vibe coding can 100% get you to a PoC/MVP probably 10x faster than pre LLMs

- This is partly b/c it is good at things I'm not good at (e.g. front end design)

- But then I need to go in and double check performance, correctness, information flow, security etc

- The LLM makes this easier but the improvement drops to about 2-3x b/c there is a lot of back and forth + me reading the code to confirm etc (yes, another LLM could do some of this but then that needs to get setup correctly etc)

- The back and forth part can be faster if e.g. you have scripts/programs that deterministically check outputs

- Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)

So overall, this is why I think we're getting wildly different reports on how effective vibe coding is. If you've never built a data pipeline and a LLM can spin one up in a few minutes, you think it's magic. But if you've spent years debugging complicated trading or compliance data pipelines you realize that the LLM is saving you some time but not 10x time.


Replies

matt_heimertoday at 2:47 PM

I'm building a Java HFT engine and the amount of things AI gets wrong is eye opening. If I didn't benchmark everything I'd end up with much less optimized solution.

Examples: AI really wants to use Project Panama (FFM) and while that can be significantly faster than traditional OO approaches it is almost never the best. And I'm not taking about using deprecated Unsafe calls, I'm talking about using primative arrays being better for Vector/SIMD operations on large sets of data. NIO being better than FFM + mmap for file reading.

You can use AI to build something that is sometimes better than what someone without domain specific knowledge would develop but the gap between that and the industry expected solution is much more than 100 hours.

show 7 replies
Aurornistoday at 3:24 PM

There’s a big gap between reality and the influencer posts about LLMs. I agree with you that LLMs do provide some significant acceleration, but the influencers have tried to exaggerate this into unbelievable numbers.

Even non-influencers are trying to exaggerate their LLM skills as a way to get hired or raise their status on LinkedIn. I rarely read the LinkedIn social feed but when I check mine it’s now filled with claims from people about going from idea to shipped product in N days (with a note at the bottom that they’re looking for a new job or available to consult with your company). Many of these posts come from people who were all in on crypto companies a few years ago.

The world really is changing but there’s a wave of influencers and trend followers trying to stake out their claims as leaders on this new frontier. They should be ignored if you want any realistic information.

I also think these exaggerated posts are causing a lot of people to miss out on the real progress that is happening. They see these obviously false exaggerations and think the opposite must be true, that LLMs don’t provide any benefit at all. This is creating a counter-wave of LLM deniers who think it’s just a fad that will be going away shortly. They’re diminishing in numbers but every LLM thread on HN attracts a few people who want to believe it’s all just temporary and we’re going back to the old ways in a couple years.

show 4 replies
cvhctoday at 8:53 PM

> Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)

Actually I had some terrible experiences when asking the agent to do something simple in our codebase (like, rename these files and fix build scripts and dependencies) but it spent much longer time than a human, because it kept running the full CI pipelines to check the problems after every attempted change.

A human would, for example, rely on the linter to detect basic issues, run a partial build on affected targets, etc. to save the time. But the agent probably doesn't have a sense of time elapsed.

show 2 replies
bittermandeltoday at 6:18 PM

This is exactly my experience at Lovable. For some parts of the organization, LLMs are incredibly powerful and a productivity multiplier. For the team I am in, Infra, it's many times distraction and a negative multiplier.

I can't say how many times the LLM-proposed solution to a jittery behavior is adding retries. At this point we have to be even more careful with controlling the implementation of things in the hot path.

I have to say though, giving Amp/Claude Code the Grafana MCP + read-only kubectl has saved me days worth of debugging. So there's definitely trade-offs!

show 1 reply
Aperockytoday at 2:48 PM

The magic is testing. Having locally available testing and high throughput testing with high amount of test cases now unlocks more speed.

The test cases themselves becomes the foci - the LLM usually can't get them right.

show 3 replies
yojotoday at 3:17 PM

> - This is partly b/c it is good at things I'm not good at (e.g. front end design)

Everyone thinks LLMs are good at the things they are bad at. In many cases they are still just giving “plausible” code that you don’t have the experience to accurately judge.

I have a lot of frontend app dev experience. Even modern tools (Claude w/Opus 4.6 and a decent Claude.md) will slip in unmaintainable slop in frontend changes. I catch cases multiple times a day in code review.

Not contradicting your broader point. Indeed, I think if you’ve spent years working on any topic, you quickly realize Claude needs human guidance for production quality code in that domain.

show 2 replies
the__alchemisttoday at 7:09 PM

I concur on the DevSecOps aspect for a more specific reason: If you're failing a pipeline because ThirdPartyTOol69 doesn't like your code style or W/E, you can have the LLM fix it. Or get you to 100% test coverage etc. Or have it update your Cypress/Jest/SonarQube configs until the pipeline passes without losing brain cells doing it by hand. Or finds you a set of dependency versions that passes.

bauerdtoday at 3:09 PM

>Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)

Absolutely. Tight feedback loops are essential to coding agents and you can’t run pipelines locally.

bojangleslovertoday at 3:24 PM

What I do now is I make an MVP with the AI, get it working. And then tear it all down and start over again, but go a little slower. Maybe tear down again and then go even more slowly. Until I get to the point where I'm looking at everything the AI does and every line of code goes through me.

ameliustoday at 4:28 PM

Also, now you're reading someone else's code and not everybody likes that. In fact, most self-proclaimed 10x coders I know hate it.

So instead of the 10x coder doing it, the 1x coder does it, but then that factor of 3x becomes 0.3x.

show 1 reply
netbioserrortoday at 5:55 PM

More generally: LLM effectiveness is inversely proportional to domain specificity. They are very good at producing the average, but completely stumble at the tails. Highly particular brownfield optimization falls into the tails.

baxtrtoday at 3:24 PM

Isn’t that the reason why people advocate for spec-driven development instead of vibe coding?

quater321today at 3:43 PM

At this point, every programmer who claims that vibecoding doesn't make you at least 10 times more productive is simply lying or worst, doesn't know how to vibe code. -So, you want to tell me that you don't review the code you write? Or that others don't review it? - You bring up ONE example with a bottleneck that has nothing to do with programming. Again, if you claim it doesn't make you 10x more productive, you don't know how to use AI, it is that simple. - I pin up 10 agents, while 5 are working on apps, 5 do reviews and testing, I am at the end of that workflow and review the code WHILE the 10 agents keep working.

For me it is far more than 10x, but I consider noobs by saying 10x instead of 20x or more.

show 4 replies