Figuring out how to trust AI-written code faster is the project of software engineering for t...

fishtoaster • yesterday at 6:31 PM • 12 replies • view on HN

Figuring out how to trust AI-written code faster is the project of software engineering for the next few years, IMO.

We'll need to figure out the techniques and strategies that let us merge AI code sight unseen. Some ideas that have already started floating around:

- Include the spec for the change in your PR and only bother reviewing that, on the assumption that the AI faithfully executed it

- Lean harder on your deterministic verification: unit tests, full stack tests, linters, formatters, static analysis

- Get better ai-based review: greptile and bugbot and half a dozen others

- Lean into your observability tooling so that AIs can fix your production bugs so fast they don't even matter.

None of these seem fully sufficient right now, but it's such a new problem that I suspect we'll be figuring this out for the next few years at least. Maybe one of these becomes the silver bullet or maybe it's just a bunch of lead bullets.

But anyone who's able to ship AI code without human review (and without their codebase collapsing) will run circles around the rest.

Replies

sarchertech • yesterday at 6:56 PM

Translating from a natural language spec to code involves a truly massive amount of decision making.

For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.

Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously that won’t end up with all of those observable differences constantly shifting, resulting in unusable software.

Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand.

The only solution to this problem is that LLMs get better. Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Mad or Star Trek.

➕ show 3 replies

gopalv • yesterday at 8:40 PM

> We'll need to figure out the techniques and strategies that let us merge AI code sight unseen

Every strategy which worked with an off-shore team in India works well for AI.

Sometime in mid 2017, I found myself running out of hours in the day stopping code from being merged.

On one hand, I needed to stamp the PRs because I was an ASF PMC member and not a lot of the folks who were opening JIRAs were & this wasn't a tech debt friendly culture, because someone from LinkedIn or Netflix or EMR could say "Your PR is shit, why did you merge it?" and "Well, we had a release due in 6 days" is not an answer.

Claude has been a drop-in replacement for the same problem, where I have to exercise the exact same muscles, though a lot easier because I can tell the AI that "This is completely wrong, throw it away and start over" without involving Claude's manager in the conversation.

The manager conversations were warranted and I learned to be nicer two years into that experience [1], but it's a soft skill which I no longer use with AI.

Every single method which worked with a remote team in a different timezone works with AI for me & perhaps better, because they're all clones of the best available - specs, pre-commit verifiers, mandatory reviews by someone uncommitted on the deadline, ease of reproducing bugs outside production and less clever code over all.

[1] - https://notmysock.org/blog/2018/Nov/17/

➕ show 1 reply

ahsisjb • yesterday at 7:07 PM

> Figuring out how to trust AI-written code faster is the project of software engineering for the next few years, IMO

Replace AI written with “cheap dev written” and think about why that isn’t already true.

The bottleneck is a competent dev understanding a project. Always has been.

Another fundamental flaw is you can’t trust LLMs. It’s fundamentally impossible compared to the way you trust a human. Humans make mistakes. LLMs do not. Anything “wrong” they do is them working exactly as designed.

➕ show 1 reply

dwb • yesterday at 8:03 PM

What is this obsession with specifications? For a start it’s certainly not fair to assume an LLM has translated it into correct code, even if there is one reasonable way to do so, and there probably isn’t. I like a good, well-targeted spec as much as anyone, but come on. A spec detailed enough to describe a program is more-or-less the program but written in a non-executable language. I want to review the code, not a spec.

➕ show 1 reply

orsorna • yesterday at 6:36 PM

>Lean harder on your deterministic verification: unit tests, full stack tests, linters, formatters, static analysis

It's wild that the gamut of PRs being zipped around don't even do these. You would run such validations as a human...

bigstrat2003 • yesterday at 8:42 PM

> Figuring out how to trust AI-written code faster is the project of software engineering for the next few years, IMO.

Or we could actually, you know, stop using a tool that doesn't work. People are so desperate to believe in the productivity boosts of AI that they are trying to contort the whole industry around a tool that is bad at its job, rather than going "yeah that tool sucks" and moving on like a sane person would.

pjm331 • yesterday at 7:38 PM

My bet is that the last item is what we’ll end up leaning heavily on - feels like the path of least resistance

Throw in some simulated user interactions in a staging environment with a bunch of agents acting like customers a la StrongDM so you can catch the bugs earlier

gspr • yesterday at 9:51 PM

> We'll need to figure out the techniques and strategies that let us merge AI code sight unseen.

Why do you assume that's doable? I'm not saying it's not, but it seems strange to just take for granted that it is.

➕ show 1 reply

zer00eyz • yesterday at 6:51 PM

> Include the spec for the change in your PR

We would have to get very good at these. It's completely antithetical to the agile idea where we convey tasks via pantomime and post it rather than formal requirements. I wont even get started on the lack of inline documentation and its ongoing disappearance.

> Lean harder on your deterministic verification: unit tests, full stack tests,

Unit tests are so very limited. Effective but not the panacea that the industry thought it was going to be. The conversation about simulation and emulation needs to happen, and it has barely started.

> We'll need to figure out the techniques and strategies that let us merge AI code sight unseen.

Most people who write software are really bad at reading other's code, and doing systems level thinking. This starts at hiring, the leet code interview has stocked our industry with people who have never been vetted, or measured on these skills.

> But anyone who's able to ship AI code without human review

Imagine we made every one go back to the office, and then randomly put LSD in the coffee maker once a week. The hallucination problem is always going to be NON ZERO. If you are bundling the context in, you might not be able to limit it (short of using two models adversarially). That doesn't even deal with the "confidently wrong" issue... what's an LLM going to do with something like this: https://news.ycombinator.com/item?id=47252971 (random bit flips).

We haven't even talked about the human factors (bad product ideas, poor UI, etc) that engineers push back against and an LLM likely wont.

That doesn't mean you're completely wrong: those who embrace AI as a power tool, and use it to build their app, and tooling that increases velocity (on useful features) are going to be the winners.

user3939382 • yesterday at 7:04 PM

I made a distributed operating system that manages all of this. Not just for agents per se but in general allows many devs to work simultaneously without tons of central review and allows them to keep standards high while working independently.

gjsman-1000 • yesterday at 6:38 PM

Do you know what happens to every industry when they get too fast and slapdash?

Regulation.

It happened with plumbing. Electricians. Civil engineers. Bridge construction. Haircutting. Emergency response. Legal work. Tech is perhaps the least regulated industry in the world. Cutting someone’s hair requires a license, operating a commercial kitchen requires a license, holding the SSN of 100K people does not yet.

If AI is fast and cheap, some big client will use it in a stupid manner. Tons of people can and will be hurt afterward. Regulation will follow. AI means we can either go faster, or focus on ironing out every last bug with the time saved, and politicians will focus on the latter instead of allowing a mortgage meltdown in the prime credit market. Everyone stays employed while the bar goes higher.

➕ show 2 replies

Copyrightest • yesterday at 7:05 PM

[dead]

alt Hacker News

Replies