A lot of people down on AI in this thread, but I'm watching the industry slip over the line of ...

spicyusername • yesterday at 11:45 AM • 25 replies • view on HN

A lot of people down on AI in this thread, but I'm watching the industry slip over the line of trust with these latest frontier models. GPT 5.5 is the first model good enough for me to just let rip.

Every jira ticket I see now has acceptance criteria, reproduction steps, and detailed information about why the ticket exists.

Every commit message now matches the repo style, and has detailed information about what's contained in the commit.

Every MR now has detailed information about what's being merged.

Every code base in the teams around me now has 70 to 90%+ code coverage.

Every line of code now comes with best practices baked in, helpful comments, and optimized hot paths.

I regularly ship four features at a time now across multiple projects.

The MCP has now automated away all of the drudgery of programming, from summarizing emails, to generating confluence documentation, to generating slide decks.

People keep screaming that tech debt is going to pile up, but I think it's going to be exactly the opposite. Software is going to pile up because developing it is now cheap.

Most code before llms sucked. Most projects I on-boarded to were a massive ball of undocumented spaghetti, written by humans. The floor has been raised significantly as to what bad code can even look like, and fixing issues is now basically free if your company is willing to shell out for tokens.

Replies

HarHarVeryFunny • yesterday at 12:59 PM

> Software is going to pile up because developing it is now cheap.

Software to do what, though ?!

Coding, maybe 10% of a developers job (Brooks "Silver Bullet" estimates 1/6), was never the bottleneck, and even if you automated that away entirely then you've only reduced development time by 10% (assuming you are not doing human code review etc).

I would also argue that software development as a whole (not just the coding part) was also typically never the bottleneck to companies shipping product faster, maybe also not for automating their business faster (internal IT systems), since the rest of the company is not moving that fast, business needs are not changing that fast, and external factors that might drive change are not moving that fast either.

I think that when the dust settles we'll find that LLM-assisted coding has had far less impact than those trying to sell it to us are forecasting. There will be exceptions of course, especially in terms of what a lone developer can do, or how fast a software startup can get going, but in terms of impact to larger established companies I expect not so much.

➕ show 8 replies

banannaise • yesterday at 3:22 PM

The ticket has subtle errors in its description that are only caught by someone experienced with the codebase.

The code hides an exception behind an if-then-else that defaults to the most common state, which isn't caught until it breaks things for the 1% of users who don't have that state.

The new feature quietly breaks a feature not covered by the acceptance tests.

The documentation is four times as long and nobody who relies on it can read it.

And I'm stuck spending my time going over tickets with a fine-toothed comb, reviewing PRs, and mentoring contributors to prevent all of this garbage from ending up in the live code.

➕ show 4 replies

neya • yesterday at 1:01 PM

What you are describing is a the role of a manager, not a software engineer. Software engineering has very little to do with writing code, but more on architecting at the higher level on what needs to be done. The code is just the executional part. LLMs can code? Ok good. Without a clear architectural pathway / direction, that code is just useless. It's not tech debt. It's just a bunch of random strings. You can argue that Claude code and others do create a plan of attack - but still, it's not at the architectural level, but rather executional level.

To me, architecture starts all the way from the top - even before you write a single line of code, you do the DDD (Domain-Driven Design) and then create a set of rulesets (eg. use the domain name as table prefix) and contexts and then define the functionality w.r.t to that architecture. LLMs can do all this - only if you ask them to explicitly. So, they are pretty useful to brainstorm with, but not autonomously design reliably and push it to production with your eyes closed and support a 100,000 user base. It's a far cry from that.

But sure, you can upsell to management about the vanity metrics like lines of code and get that promotion with LLM. But, it's still not software engineering.

➕ show 2 replies

duskdozer • yesterday at 1:08 PM

>I regularly ship four features at a time now across multiple projects.

Well, this explains why so much software nowadays is so slow, buggy, and chaotic.

➕ show 1 reply

alrtkh • yesterday at 12:04 PM

For people who like to tick boxes, which is essentially most of the above, AI is welcome. That includes managers.

It still has nothing to do with software engineering. All good code was written by humans. AI took it, plagiarizes it, launders it and repackages it in a bloated form.

Whenever I look deeply at an AI plagiarized mess, it looks like it is 90% there but in reality it is only 50%. Fixing the mess takes longer than writing it oneself.

➕ show 4 replies

p2detar • yesterday at 12:12 PM

> I regularly ship four features at a time now across multiple projects.

Can that happen without you? I would assume this is the next step. I don't find it either good or bad, but I'm genuinely curious where this all goes.

➕ show 2 replies

onion2k • yesterday at 8:53 PM

Software is going to pile up because developing it is now cheap.

It won't, because right now we're busy exhausting the vein of good-ideas-we-wanted-to-build, and that's the source of all the good stuff you listed. When that runs out you'll see teams building any old crap because building is cheap, and learning that experimenting by putting any old crap in front of users is a fast way to burn goodwill and brand loyalty.

You still need good ideas and the taste to choose which to put out there over the bad ideas that people actively dislike.

onlyrealcuzzo • yesterday at 12:08 PM

> I regularly ship four features at a time now across multiple projects.

Many people are missing the fact that LLMs allow ICs to start operating like managers.

You can manage 4 streams now. Within a couple years, you may be able to manage 10 streams like a typical manager does today.

IME, LLMs don't speed you up that much if 1) you're already an expert at what you're doing (inherently not scalable), 2) you're only working on one thing (doesn't make sense when you can manage multiple streams), or 3) doing something LLMs are particularly bad it (not many remaining coding tasks, but definitely still some).

➕ show 2 replies

altruios • yesterday at 3:03 PM

> and fixing issues is now basically free if your company is willing to shell out for tokens.

Does "basically free" to you mean for you just that someone else is paying the cost? That's a mentality that has only made the world worse when applied to a wider range of things. Be hesitant in that line of thinking, I suggest, and consider the future.

reus09 • today at 4:18 AM

I'm seeing the exact opposite with LLMs. So much unmaintainable brittle code is being generated since devs are not even looking at the code and LLMs are dumb like 75% of the time

nyxtom • yesterday at 12:27 PM

I agree with most of this, I just have sort of turned a blind eye to what the code actually probably looks like. Reviews are rapid, and I’ll admit I do feel like I’m betraying my inner programmer by just optimizing directly against the claims of token bot. But the way I see it, as long as the numbers don’t lie I’m okay with the process.

BlueRock-Jake • yesterday at 9:56 PM

Agreed on the floor being raised. The part I'd push back on is "fixing issues is now basically free." That's true for the issues that surface in code review or a failing test. The new class of issue is good-looking code that does something unexpected at runtime, usually through chains of tool calls that each looked fine in isolation. Those don't fail your tests. They fail in prod, sometimes quietly.

yodsanklai • yesterday at 9:44 PM

Sometimes I wonder if people praising AI work on the same type of code as I do.

Just now, I was working on a bug report. I had Claude write the code. Perfect, CI is green, new tests, everything seems fine. Took me 5 minutes. Then looking closer, I can see that there may be a performance regression and that the code seems pretty verbose. I iterate on the prompt "of course, you're right, let me fix this". New code is even more verbose, lots of comments that shouldn't be there, the code is more intricate, it takes me some time to understand what's going on. Plus new test cases to review.

After a day of asynchronous iterations on this, I finally sit down to look at this problem. There was a one line fix that Claude couldn't find on its own.

I lost time, reviewer lost time, and if this had been shipped as is, the system would have been worse. I could go on and on because this happens daily. And the worst part is teammates submitting slop.

pryelluw • yesterday at 8:54 PM

This better stated as: Use of agents has forced teams to adopt best practices and guide style guides.

Which is my experience. Once you get into the actual development process, the code itself produced by the agents is not good enough. Still needs editing and rewriting.

kiba • yesterday at 12:34 PM

Everyone talks about productivity as if that is the only metric that matters in the business.

The MCP has now automated away all of the drudgery of programming, from summarizing emails, to generating confluence documentation, to generating slide decks.

I wonder about the hallucination. Reading someone's writing doesn't take all that long.

➕ show 1 reply

happytoexplain • yesterday at 12:08 PM

I think numerically this is the exception - and it's a fantastic exception! But in practice what I've seen is things getting worse because people still just aren't very good at thinking, so the great-looking Jira ticket actually turns out to be nonsensical in some subtle way, whereas before it was just lacking in some obvious way that could immediately be called out and had an obvious solution.

I.e. it's making good output better, but it's making mediocre output (which is most output) worse by adding volume and the appearance of quality, creating a new layer of FUD, stress, tedium, and unhappiness on top of the previously more-manageable problems that come with mediocre output.

I'm still seeing this even with the newest models, because the problem is the user, not the model - the model just empowers them to be even worse, in a new and different way.

globnomulous • yesterday at 2:53 PM

I was an LLM naysayer for a very long time. I continue to have serious reservations about the ethics of LLM use and the likely economic effects (these tools are likely to empower the owners of capital and disempower labor). On the other hand, I had a rather striking experience the other day that convinced me that the future in which these tools write software may not be so bad:

I had an idea to improve performance in one of the slowest but also one of the most critical parts of the codebase I own, so I asked Claude to re-write it. I gave it exact instructions. It got most things right but key things wrong. I caught the bugs and then asked it for some optimizations, and it came up with a number that were quite good. As I read the code, I saw more and more opportunities for improvement. To make a long story short, code that used to require upwards of 30 seconds in a particularly heinously ugly stress test now finishes in about 8ms.

My original code was terrible. That's indisputable. Maybe the bar for improvement was low. Still, the algorithms and optimizations that I was able to devise while using Claude Opus 4.6 surprised me. I don't often feel pleased with the cleverness of my work, but in this case the work really is stellar -- or at least enough of an improvement that it feels stellar.

Could I have written it without Claude? Yes, definitely. But I was able to produce the code in a few days while having a fever of 100-102, which I definitely couldn't have done on my own.

Moreover, it was plainly apparent to me, while I worked, that I was better able to think about high-level architecture and design because I wasn't stuck on the details of actually writing the code. The code itself, line by line, isn't difficult if you have familiarity with bitwise operations, but there's enough of it, with enough branches, that it's difficult as a whole and the work of writing it would have consumed much of my attention and energy.

Claude missed a huge amount. I improved performance by more than 95% after it told me there were no other opportunities for major optimizations.

Using the tool freed me, I found, to think more clearly, more deeply, and more effectively. Does the result create tech debt? I don't think so. I've pored over it and can't find anything lacking in style, design, or architecture. It's very well documented. Claude wrote tests, as I requested, for everything, including all the bugs that Claude missed and I caught. Test coverage is probably 100%, but, much more importantly, tests exhaustively cover cases, including edge cases, that would have, again, been difficult to enumerate and write by myself.

I doubt Claude could have done all this as well if the codebase and tests weren't already as mature as they are. I really wonder about the feasibility and advisability of greenfield software development with these tools. And a junior developer absolutely couldn't have accomplished what I did. The tool would have produced far worse work in the hands of someone who doesn't know what they were doing.

So I agree with you and disagree: I'm turning a corner on these tools, but I absolutely could not just let rip and trust it to do anything correctly. Moreover, I could not be less impressed by the MCPs written by people in my company. The bare tool by itself is better, though maybe that says more about my company, and my regards for the people I work with, than the tools.

➕ show 2 replies

mhitza • yesterday at 12:04 PM

> GPT 5.5 is the first model good enough for me to just let rip.

You know this is the exact same thing said during Opus 4.6, right?

That makes it hard to believe because it's the same "last week's model was so much behind you can't even comprehend" meme that's been going on throughout last year.

More info dumped into tickets and projects is great for understanding for both people and LLM. But hopefully not LLM generated.

➕ show 3 replies

Tade0 • yesterday at 1:53 PM

> fixing issues is now basically free if your company is willing to shell out for tokens.

Yeah, about that: I looked into Cursor's usage stats and daily I'm going through the equivalent of a bacon sandwich in my cantina, so not much, but this is at today's prices and very light usage of Sonnet.

I was for a time using Opus 4.6 for a heavier task and even then I think the cost was well into the double digit percentages of my salary.

Opus 4.7 reportedly uses more tokens overall and while they reportedly kept rates stable, that is not a given.

Just wait until, with increasing costs, the first company figures that they'll offer this as a benefit and then maybe scrap it altogether in the name of cost cutting.

➕ show 1 reply

oblio • yesterday at 12:31 PM

> Software is going to pile up because developing it is now cheap.

https://somehowmanage.com/2020/10/17/code-is-a-liability-not...

➕ show 1 reply

skywhopper • yesterday at 6:34 PM

Yes, the software that piles up literally is the tech debt. Every automation and tool that was vibe-coded has to be maintained as well. If software is 100x easier to write and you write 100x as much of it, then taking into account network effects, your tech debt is now 100x worse. Congrats!

acedTrex • yesterday at 6:57 PM

> The floor has been raised significantly as to what bad code can even look like

It's hard for me to disagree with this take more wow. LLM slop code is TERRIBLE and verbose.

sjq2026 • yesterday at 2:54 PM

[dead]

inquirerGeneral • yesterday at 7:32 PM

[dead]

qazxcvbnmlp • yesterday at 2:58 PM

The gap between the ai haves and have-nots is starting to appear. 6 months ago a developer with copilot was about on par with one without. The AI code required a lot of review, about the same amount of time as writing the code manually.

Now.. the AI first engineer might still have to deal with hallucinated things. But.. they can also use the newfound cheapness of code to improve their workflow. Instead of just testing on localhost and manually deploying to prod, you can have a full dev, staging, prod pipeline for free. Tech debt can be one command from being refactored. The open source package that doesn’t quite do what you need it to do? Fork it and write a patch. The ai will be able to maintain the patch. Oh.. you need that bespoke feature for management? Np, done in a 1hr ai session.

Each of these things might be arguably insignificant on their own but net over a projects lifetime they really build up.

➕ show 1 reply

alt Hacker News

Replies