The other night I was too tired to code so I decided to try vibe coding a test framework for the C&#...

01100011 • 05/04/2025 • 7 replies • view on HN

The other night I was too tired to code so I decided to try vibe coding a test framework for the C/C++ API I help maintain. I've tried this a couple times so far with poor results but I wanted to try again. I used Claude 3.5 IIRC.

The AI was surprisingly good at filling in some holes in my specification. It generated a ton of valid C++ code that actually compiled(except it omitted the necessary #includes). I built and ran it and... the output was completely wrong.

OK, great. Now I have a few hundred lines of C++ I need to read through and completely understand to see why it's incorrect.

I don't think it will be a complete waste of time because the exercise spurred my thinking and showed me some interesting ways to solve the problem, but as far as saving me a bunch of time, no. In fact it may actually cost me more time trying to figure out what it's doing.

With all due respect to folks working on web and phone apps, I keep getting the feeling that AI is great for high level, routine sorts of problems and still mostly useless for systems programming.

Replies

sensanaty • 05/04/2025

> With all due respect to folks working on web and phone apps, I keep getting the feeling that AI is great for high level, routine sorts of problems and still mostly useless for systems programming.

As one of those folks, no it's pretty bad in that world as well. For menial crap it's a great time saver, but I'd never in a million years do the "vibe coding" thing, especially not with user-facing things or especially not for tests. I don't mind it as a rubber duck though.

I think the problem is that there's 2 groups of users, the technical ones like us and then the managers and C-levels etc. They see it spit out a hundred lines of code in a second and as far as they know (and care) it looks good, not realizing that someone now has to spend their time reviewing the 100 lines of code, plus having the burden of maintenance of those 100 lines going into the future. But, all they see is a way to get the pesky, expensive devs replaced or at least a chance squeeze more out of them. The system is so flashy and impressive looking, and you can't even blame them for falling for the marketing and hype, after all that's what all the AIs are being sold as, omnipotent and omniscient worker replacers.

Watching my non-technical CEO "build" things with AI was enlightening. He prompts it for something fairly simple, like a TODO List application. What it spits out works for the most part, but the only real "testing" he does is clicking on things once or twice and he's done and satisfied, now convinced that AI can solve literally everything you throw at it.

However if he were testing the solution as a proper dev would, he'd see that the state updates break after a certain amount of clicks, and that the list was glitching out sometimes, and that adding things breaks on scroll and overflows the viewport, and so on. These are all real examples of an "app" he made by vibe coding, and after playing around with it myself for all of 3 minutes I noticed all these issues and more in his app.

➕ show 2 replies

imiric • 05/04/2025

> With all due respect to folks working on web and phone apps, I keep getting the feeling that AI is great for high level, routine sorts of problems and still mostly useless for systems programming.

As someone working on routine problems in mainstream languages where training data is abundant, LLMs are not even great for that. Sure, they can output a bunch of code really quickly that on the surface appears correct, but on closer inspection it often uses nonexistent APIs, the logic is subtly wrong or convoluted for no reason, it does things you didn't tell it to do or ignores things you did, it has security issues and other difficult to spot bugs, and so on.

The experience is pretty much what you summed up. I've also used Claude 3.5 the most, though all other SOTA model have the same issues.

From there, you can go into the loop of copy/pasting errors to the LLM or describing the issues you did see in the hopes that subsequent iterations will fix them, but this often results in more and different issues, and it's usually a complete waste of time.

You can also go in and fix the issues yourself, but if you're working with an unfamiliar API in an unfamiliar domain, then you still have to do the traditional task of reading the documentation and web searching, which defeats the purpose of using an LLM to begin with.

To be clear: I don't think LLMs are a useless technology. I've found them helpful at debugging specific issues, and implementing small and specific functionality (i.e. as a glorified autocomplete). But any attempts of implementing large chunks of functionality, having them follow specifications, etc., have resulted in much more time and effort spent on my part than if I had done the work the traditional way.

The idea of "vibe coding" seems completely unrealistic to me. I suspect that all developers doing this are not even checking whether the code does what they want to, let alone reviewing the code for any issues. As long as it compiles they consider it a success. Which is an insane way of working that will lead to a flood of buggy and incomplete applications, increasing the dissatisfaction of end users in our industry, and possibly causing larger effects not unlike the video game crash of 1983 or the dot-com bubble.

➕ show 2 replies

runeks • 05/04/2025

> With all due respect to folks working on web and phone apps, I keep getting the feeling that AI is great for high level, routine sorts of problems and still mostly useless for systems programming.

I agree. AI is great for stuff that's hard to figure out but easy to verify.

For example, I wanted to know how to lay out something a certain way in SwiftUI and asked Gemini. I copied what it suggested, ran it and the layout was correct. I would have spent a lot more time searching and reading stuff compared to this.

MSFT_Edging • 05/04/2025

I wish I had an ongoing counter for the amount of times I've asked chatgpt to "generate me python code that will output x data similar to xxd".

Its a snippet I've written a few times before to debug data streams, but it's always annoying to get alignment just right.

I feel like that is the sweet spot for AI, to generate actual snippets of routine code that has no bearing on security or functionality, but lets you keep thinking about the problem at hand while it does that 10 minutes of busy work.

sanderjd • 05/04/2025

Yeah, I similarly have not had great success for creating entire systems / applications for exactly this reason. I have had no success at all in not needing to go in and understand what it wrote, and when I do that, I find it largely needs to be rewritten. But I have a lot more success when I'm integrating it into the work I'm doing.

I do know people who seem to be having more success with the "vibecoding" workflow on the front end though.

getnormality • 05/04/2025

> OK, great. Now I have a few hundred lines of C++ I need to read through and completely understand to see why it's incorrect.

For a time, we can justify this kind of extra work by imagining that it is an upfront investment. I think that is what a lot of people are doing right now. It remains to be seen when AI-assisted labor is still a net positive after we stop giving it special grace as something that will pay off a lot later if we spend a lot of time on it now.

panstromek • 05/04/2025

> OK, great. Now I have a few hundred lines of C++ I need to read through and completely understand to see why it's incorrect.

I think it's often better to just skip this and delete the code. The cool thing about those agents is that the cost of trying this out is extremely cheap, so you don't have to overthink it and if it looks incorrect, just revert it and try something else.

I've been experimenting with Junie for past few days, and had very positive experience. It wrote a bunch of tests for me that I've been postponing for quite some time it was mostly correct from a single sentence prompt. Sometimes it does something incorrect, but I usually just revert it and move on, try something else later. There's definitely a sweet spot for things tasks it does well and you have to experiment a bit to find it out.

alt Hacker News

Replies