logoalt Hacker News

dofmtoday at 4:56 PM17 repliesview on HN

No it's not. This has always been a needlessly iconoclastic rather than sensible suggestion.

At the very least it is not once you're working at the wrong kind of scale.

Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace.

And in the LLM era the wrong kind of scale appears in different ways; code generated and duplicated without proper abstraction and then maintained by an LLM that cannot be trusted to do the same modification each time it encounters a pattern or to have enough of an overview to slowly rescue duplicated code through good abstractions.

I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.


Replies

coldteatoday at 5:10 PM

Hardly iconoclastic, it's a very sensible suggestion.

It would be iconoclastic if the common sense basic approach would be to start with abstraction. It's not, the common sense default is to write possibly duplicate behavior until you actually discover several cases to abstract away, until you bevalop a sensible idea of which functionality unites them and which doesn't carry over all of them.

>Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace

Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.

show 4 replies
a-dubtoday at 6:51 PM

i agree with the author. i argue a preference for loose coupling over centralized abstractions. sure it's pleasing to compress the code, but if the use cases actually are sufficiently divergent (as well as bugs and externally driven changes) ultimately it becomes brittle, littered with edge cases behind if fences and both challenging and daunting to change.

ideal case: support libraries and then very simple duplicated code that is easy to read and modify. critically the core control flow should remain duplicated, but simplified by the support libraries.

fnytoday at 5:04 PM

Code duplication is cheaper than the wrong abstraction. If you have a good abstraction, you should run with it.

If you haven't figured out a good abstraction at 5-100 customers, God help you.

show 5 replies
dangtoday at 6:51 PM

I dislike duplicate code as much as anyone, but agree with the OP that bad abstractions can be worse. They add confusion and complexity which compounds over time, since people are forced to build on top of them in ways that (by definition) don't suit the underlying domain and ultimately become self-referential. This leads to contortions, workarounds and even more bad abstractions which ought not to be there—they're reactions to the code not fitting the problem, or as Fred Brooks called it, accidental complexity. You end up in an evolutionary dead end where the system is hard to extend because it's too hard to understand.

I've learned to tolerate a small amount of duplicate code for this reason. If the duplication remains small, it's not that harmful, and if it starts to grow, one has a better shot at finding a good abstraction for it. Bad abstraction is premature abstraction.

One thing I'm not sure this thread has mentioned yet is how LLMs alter the cost-benefit curve of this. They are much better at managing duplication than humans are, and much better at noticing inconsistencies - the sort of small bugs which duplication traditionally leads to. I don't know if this is enough to count as a different kind of good abstraction; I doubt it. It reminds me of a petroleum economist I once knew who had 200 duplicate spreadsheets analyzing different projects and who hired a junior analyst to keep them all consistent. An LLM would be like the junior analyst.

ubertacotoday at 6:26 PM

I'd recommend clicking through the headline to watch the talk. Metz talks a lot about types of similarity: similarity by coincidence vs similarity due to an actual semantic or functional equivalence.

Code that is coincidentally similar very often diverges in either the short or long term, and DRYing it up aggressively tends to result in functions that have many boolean parameters that each trigger disjoint sets of behavior - which is a bit of a nightmare to maintain due to the high cognitive overhead of remembering how all the interleaved-but-actually-unrelated behaviors should work.

This outcome is low-cohesion code.

It's a useful concept to be aware of - worth clicking through to the actual content of the talk rather than just the headline.

show 1 reply
mytydevtoday at 5:27 PM

It sounds to me like you are describing a good abstraction. This article does not claim that code duplication is better than any abstraction. It claims that code duplication is better than the wrong abstraction. I'm sure this author would agree that a good abstraction is better than code duplication.

show 1 reply
agumonkeytoday at 5:18 PM

You seem to have experience, I dont mind factoring / unifying logic, when done sensibly with enough history in the trenches. It pains me more whenever a young dev comes in and barks "we must merge these two things!" repeatedly without planning for more than two cases and starting to add more and more boolean variables. Crystal makers. Then the obvious issue comes, the two variants weren't that close and now there's one god class trying to handle all forces in one big state.

I agree that LLMs are naturally anti abstraction machines.. I'm often trying to find way to reverse that.

show 1 reply
ChrisMarshallNYtoday at 5:38 PM

In my experience, the answer is always "It Depends." That's about the only thing that I can hang "always" on.

It really depends on the exact type of code we're working with, and what our objectives are.

In my case, I often use object inheritance. It's a damn cheap way to DRY. However, when people hear "inheritance," they often think "polymorphism." There's a really big difference between the two, but popular culture has jammed them into one ball, and it's not worth the agita, to try to explain the difference.

But if you are doing optimization, long stacks can be your enemy, and inheritance tends to have long, windy stacks.

In these cases, the copy/pasta method may well be the best approach.

Like I said, "It Depends."

show 1 reply
cjfdtoday at 6:43 PM

100% agree. 'Code duplication is far cheaper than the wrong abstraction' is a very good candidate for the worst programming article ever.

nfw2today at 5:34 PM

Over-engineering and "abstraction hell" are very much not iconoclastic concepts

mawadevtoday at 5:01 PM

I think you applied this idea into the era of LLMs but consider an abstraction that takes in multiple god structs for branches it may or may not call in the case you are looking at and has a lot of if conditions that explode in combinatory complexity across a deep call chain. Now the bottle neck is that you need to call this function 144 times a second. That is where you start to have clusters of hot code paths where the latency stacks depending on the angle the god structs come in. Not sure what LLMs do here, I don't vibe code

show 1 reply
Capricorn2481today at 5:09 PM

> I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are passed a de minimis threshold.

Pretty much everyone arguing for duplication has argued what you are saying, which is wait to see a few instances of it before committing to an abstraction. No one is saying duplicate everything 100 times. So I don't think this discussion was ever iconoclastic.

show 1 reply
Thaxlltoday at 5:50 PM

So you centralize 3 liners?

show 1 reply
thinklooptoday at 5:56 PM

The key lesson is that duplicate code is not necessarily "code duplication" - it was always really about abstraction duplication. If two unrelated variables happen to momentarily share a value, it doesn't mean that value should be made common between them, they are fundamentally different things. It would be a confusing lie and error-prone if the code implied they were the same and that efforts should be made for them to be in sync.

show 1 reply
jimmypktoday at 5:58 PM

[flagged]

tracerbulletxtoday at 5:29 PM

Huh? If anything having lots of customers makes the argument for duplication stronger. The issue is almost always once you get huge and 5 product teams are trying to achieve 5 different goals by using the same overwrought abstraction instead of just copying and decoupling. The abstractions that are actually stable end up becoming libraries or platform team owned systems that no one ever really touches.