> - coding is a verifiable domain You're missing the point though. "1 + 1" vs &q...

embedding-shape • yesterday at 12:29 PM • 2 replies • view on HN

> - coding is a verifiable domain

You're missing the point though. "1 + 1" vs "one.add(1)" might both be "passable" and correct, but it's missing the forest for the trees, how do you know which one is "long-term the right choice, given what we know?", which is the engineering part of building software, and less about "coding" which tends to be the easy part.

How do you evaluate, score and/or benchmark something like that? Currently, I don't think we have any methodologies for this, probably because it's pretty subjective in the end. That's where the "creative" parts of software engineering becomes more important, and it's also way harder to verify.

Replies

jorl17 • yesterday at 2:48 PM

While I agree we don't have any methodologies for this, it's also true that we can just "fail" more often.

Code is effectively becoming cheap, which means even bad design decisions can be overturned without prohibitive costs.

I wouldn't be surprised if in a couple of years we see several projects that approach the problem of tech debt like this:

1. Instruct AI to write tens of thousands of tests by using available information, documentation, requirements, meeting transcripts, etc. These tests MUST include performance AND availability related tests (along with other "quality attribute" concerns) 2. Have humans verify (to the best of their ability) that the tests are correct -- step likely optional 3. Ask another AI to re-implement the project while matching the tests

It sounds insane, but...not so insane if you think we will soon have models better than Opus 4.6. And given the things I've personally done with it, I find it less insane as the days go by.

I do agree with the original poster who said that software is moving in this direction, where super fast iteration happens and non-developers can get features to at least be a demo in front of them fast. I think it clearly is and am working internally to make this a reality. You submit a feature request and eventually a live demo is ready for you, deployed in isolation at some internal server, proxied appropriately if you need a URL, and ready for you to give feedback and have the AI iterate on it. Works for the kind of projects we have, and, though I get it might be trickier for much larger systems, I'm sure everyone will find a way.

For now, we still need engineers to help drive many decisions, and I think that'll still be the case.These days all I do when "coding" is talking (via TTS) with Opus 4.6 and iterating on several plans until we get the right one, and I can't wait to see how much better this workflow will be with smarter and faster models.

I'm personally trying to adapt everything in our company to have agents work with our code in the most frictionless way we can think of.

Nonetheless, I do think engineers with a product inclination are better off than those who are mostly all about coding and building systems. To me, it has never felt so magical to build a product, and I'm loving it.

➕ show 1 reply

aspenmartin • yesterday at 12:37 PM

because it has business context and better reasoning, and can ask humans for clarification and take direction.

You don't need to benchmark this, although it's important. We have clear scaling laws on true statistical performance that is monotonically related to any notion of what performance means.

I do benchmarks for a living and can attest: benchmarks are bad, but it doesn't matter for the point I'm trying to make.

➕ show 2 replies

alt Hacker News

Replies