logoalt Hacker News

When AI writes the software, who verifies it?

97 pointsby todsacerdotitoday at 4:34 PM86 commentsview on HN

Comments

phyzometoday at 11:23 PM

Verify? Seems like no one is even reviewing this stuff.

madroxtoday at 7:51 PM

I encourage everyone to RTFA and not just respond to the headline. This really is a glimpse into where the future is going.

I've been saying "the last job to be automated will be QA" and it feels more true every day. It's one thing to be a product engineer in this era. It's another to be working at the level the author is, where code needs to be verifiable. However, once people stop vibing apps and start vibing kernels, it really does fundamentally change the game.

I also have another saying: "any sufficiently advanced agent is indistinguishable from a DSL." I hadn't considered Lean in this equation, but I put these two ideas together and I feel like we're approaching some world where Lean eats the entire agentic framework stack and the entire operating system disappears.

If you're thinking about building something today that will still be relevant in 10 years, this is insightful.

show 3 replies
muraikitoday at 7:03 PM

The article says that AWS's Cedar authorization policy engine is written in Lean, but it's actually written in Dafny. Writing Dafny is a lot closer to writing "normal" code rather than the proofs you see in Lean. As a non-mathematician I gave up pretty early in the Lean tutorial, while in a recent prototype I learned enough Dafny to be semi-confident in reviewing Claude's Dafny code in about half a day.

The Dafny code formed a security kernel at the core of a service, enforcing invariants like that an audit log must always be written to prior to a mutating operation being performed. Of course I still had bugs, usually from specification problems (poor spec / design) or Claude not taking the proof far enough (proving only for one of a number of related types, which could also have been a specification problem on my part).

In the end I realized I'm writing a bunch of I/O bound glue code and plain 'ol test driven development was fine enough for my threat model. I can review Python code more quickly and accurately than Dafny (or the Go code it eventually had to link to), so I'm back to optimizing for humans again...

dataviz1000today at 10:54 PM

100% of my innovation for the past month has been getting the coding agent to iterate with an OODA loop (I make it validate after act step) trying to figure out how to get it to not stop iterating.

For example, I have discovered there is a big difference between prompting 'there is a look ahead bias' and 'there is a [T+1] look ahead bias' where the later will cause it to not stop until it finds the [T+1] look ahead bias. It will start to write scripts that will `.shift(1)` all values and do statistical analysis on the result set trying to find the look ahead bias.

Now, I know there isn't look ahead bias, but the point is I was able to get it to iterate automatically trying different approaches to solve the problem.

The software is going to verify itself eventually, sooner than later.

_pdp_today at 6:14 PM

I think the issue goes even deeper than verification. Verification is technically possible. You could, in theory, build a C compiler or a browser and use existing tests to confirm it works.

The harder problem is discovery: how do you build something entirely new, something that has no existing test suite to validate against?

Verification works because someone has already defined what "correct" looks like. There is possible a spec, or a reference implementation, or a set of expected behaviours. The system just has to match them.

But truly novel creation does not have ground truth to compare against and no predefined finish line. You are not just solving a problem. You are figuring out what the problem even is.

show 1 reply
drivebyhootingtoday at 11:04 PM

That was a prolix and meandering essay. Next time I’d rather just look at the prompts and hand edits that went into writing it rather than the final artifact; much like reviewing the documentation, spec, and proof over the generated code as extolled by the article.

chromatontoday at 10:15 PM

TFA seems to be big on mathematical proof of correctness, but how do you ever know you're proving the right thing?

oztentoday at 10:40 PM

This is really great and important progress, but Lean is still an island floating in space. Too hard to get actual work done building any real world system.

oakpondtoday at 6:04 PM

You do. Even the latest models still frequently write really weird code. The problem is some developers now just submit code for review that they didn't bother to read. You can tell. Code review is more important than ever imho.

show 5 replies
boznztoday at 8:05 PM

This is the biggest problem going forward. I wrote about the problem many times on my blog, in talks, and as premises in my sci-fi novels

Sitting in your cubical with your perfect set of test suites, code verification rules, SOP's and code reviews you wont want to hear this, but other companies will be gunning for your market; writing almost identical software to yours in the future from a series of prompts that generate the code they want fast, cheap, functionally identical, and quite possibly untested.

As AI gets more proficient and are given more autonomy (OpenClaw++) they will also generate directly executable binaries completely replacing the compiler, making it unreadable to a normal human, and may even do this without prompts.

The scenario is terrifying to professional software developers, but other people will do this regardless of what you think, and run it in production, and I expect we are months or just a few years away from this.

Source code of the future will be the complete series of prompts used to generate the software, another AI to verify it, and an extensive test suites.

show 2 replies
yoaviramtoday at 7:16 PM

I just finished writing a post about exactly this. Software development, as the act of manually producing code, is dying. A new discipline is being born. It is much closer to proper engineering.

Like an engineer overseeing the construction of a bridge, the job is not to lay bricks. It is to ensure the structure does not collapse.

The marginal cost of code is collapsing. That single fact changes everything.

https://nonstructured.com/zen-of-ai-coding/

show 3 replies
anonhacker199today at 10:22 PM

The biggest issue right now is most AI tools aren't hooked up appropriately to an environment they can test in (Chrome typically). Replit works extremely well because it has an integrated browser and testing strategy. AI works very well when it has the ability to check its own work.

50lotoday at 7:54 PM

One thing that seems under-discussed in this space is the shift from verifying programs to verifying generation processes.

If a piece of code is produced by an agent loop (prompt -> tool calls -> edits -> tests), the real artifact isn’t just the final code but the trace/pipeline that produced it.

In that sense verification might look closer to: checking constraints on the generator (tests/specs/contracts), verifying the toolchain used by the agent, and replaying generation under controlled inputs.

That feels closer to build reproducibility or supply-chain verification than traditional program proofs.

bryanlarsentoday at 8:13 PM

You can use AI to make a reviewers job much easier. Add documents, divide your MR into reviewable chunks, et cetera.

If reviewing is the expensive part now, optimize for reviewability.

mentalgeartoday at 7:57 PM

> Where this leads is clear. Layer by layer, the critical software stack will be reconstructed with mathematical proofs built in. The question is not whether this happens, but when.

slopinthebagtoday at 10:31 PM

LLM generated code combined with formal verification just feels like we're entering the most ridiculous timeline. We know formal verification doesn't work at scale, hence we don't use it. We might get fully vibe coded systems but we sure as hell won't be able to verify them.

The collapse of civilisation is real.

holtkam2today at 6:21 PM

At the end of the day you need humans who understand the business critical (or safety critical) systems that underpin the enterprise.

Someone needs to be held accountable when things go wrong. Someone needs to be able to explain to the CEO why this or that is impossible.

If you want to have AI generate all the code for your business critical software, fine, but you better make sure you understand it well. Sometimes the fastest path to deep understanding is just coding things out yourself - so be it.

This is why the truly critical software doesn’t get developed much faster when AI tools are introduced. The bottleneck isn’t how fast the code can be created, it’s how fast humans can construct their understanding before they put their careers on the line by deploying it.

Ofc… this doesn’t apply to prototypes, hackathons, POCs, etc. for those “low stakes” projects, vibe code away, if you wish.

sslayertoday at 8:40 PM

State Sponsored Hackers AI will verify it.

nemo44xtoday at 10:43 PM

I believe the old ways, which agile destroyed, will come back because the implementation isn’t the hardest part now. Agile recognized correctly that implementation was the hard part to predict and that specification through requirements docs, UML, waterfall, etc. were out of date by the time the code was cooked.

I don’t think we’ll get those exact things back but we will see more specification and design than we do today.

show 1 reply
indymiketoday at 6:33 PM

Because of the scale of generated code, often it is the AI verifying the AI's work.

show 1 reply
simonwtoday at 6:50 PM

The "Nearly half of AI-generated code fails basic security tests" link provided in this piece is not credible in my opinion. It's a very thinly backed vendor report from a company selling security scanning software.

acedTrextoday at 5:53 PM

No one does currently, and its going to take a few very painful and high profile failures of vital systems for this industry to RELEARN its lesson about the price of speed.

In fact it will probably need to happen a few times PER org for the dust to settle. It will take several years.

show 2 replies
lgltoday at 6:26 PM

I'm in the process of building v2.0 of my app using opus 4.6 and largely agree with this.

It's pretty awesome but still does a lot of basic idiotic stuff. I was implementing a feature that required a global keyboard shortcut and asked opus to define it, taking into account not to clash with common shortcuts. He built a field where only one modifier key was required. After mentioning that this was not safe since users could just define CTRL+C for the shortcut and we need more safeguards and require at least two modifier keys I got the usual "you're absolutely right" and proceeded to require two modifier keys. But then it also created a huge list of common shortcuts into a blacklist like copy, cut, paste, print, select all, etc.. basically a bunch of single modifier key shortcuts. Once I mentioned that since we're already forcing two modifier keys that's useless it said I'm right again and fixed it.

The counter point of this idiocy is that it's very good overall at a lot of what is (in my mind) much more complicated stuff. It's a .NET app and stuff like creating models, viewmodels, usercontrols, setting up the entire hosting DI with pretty much all best practices for .net it does it pretty awesomely.

tl;dr is that training wheels are still mandatory imho

bitwizetoday at 7:20 PM

Also AI.

righthandtoday at 5:42 PM

No one really. Code is for humans to read and for machines to compile and execute. Llms are enabling people to just write the code and not have anyone read it. It’s solving a problem that didn’t really exist (we already had code generators before llms).

It’s such an intoxicating copyright-abuse slot machine that a buddy who is building an ocaml+htmx tree editor told me “I always get stuck and end up going to the llm to generate code. Usually when I get to the html part.” I asked if he used a debugger before that, he said “that’s a good idea”.

show 1 reply
aplomb1026today at 6:32 PM

[dead]

bdcravenstoday at 10:40 PM

The same ones who verify it when I write it: my customers in production! /s (well, maybe /s)

foolfoolztoday at 5:56 PM

no one wants to believe this but there will be a point soon when an ai code review meets your compliance requirements to go to production. is that 2026? no. but it will come

show 1 reply
rademakertoday at 5:16 PM

In his latest essay, Leonardo de Moura makes a compelling case that if AI is going to write a significant portion of the world’s software, then verification must scale alongside generation. Testing and code review were never sufficient guarantees, even for human-written systems; with AI accelerating output, they become fundamentally inadequate. Leo argues that the only sustainable path forward is machine-checked formal verification — shifting effort from debugging to precise specification, and from informal reasoning to mathematical proof checked by a small, auditable kernel. This is precisely the vision behind Lean: a platform where programs and proofs coexist, enabling AI not just to generate code, but to generate code with correctness guarantees. Rather than slowing development, Lean-style verification enables trustworthy automation at scale.