No it's not. This has always been a needlessly iconoclastic rather than sensible suggestion.
At the very least it is not once you're working at the wrong kind of scale.
Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace.
And in the LLM era the wrong kind of scale appears in different ways; code generated and duplicated without proper abstraction and then maintained by an LLM that cannot be trusted to do the same modification each time it encounters a pattern or to have enough of an overview to slowly rescue duplicated code through good abstractions.
I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.
Hardly iconoclastic, it's a very sensible suggestion.
It would be iconoclastic if the common sense basic approach would be to start with abstraction. It's not, the common sense default is to write possibly duplicate behavior until you actually discover several cases to abstract away, until you bevalop a sensible idea of which functionality unites them and which doesn't carry over all of them.
>Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace
Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.
At work there’s been a huge number of duplication in the start of the company and no solid abstraction. So no tests as well. We introduced tests in the current architecture but rewriting code has a huge cost to make sure there is no regression. When we talk about a saas it’s non-trivial with many customers relying on this tool daily as part of their workflow, regressions because of rewrite could be really painful for them. So we must give a greater budget to take the time to make sure nothing major breaks. So there is a debt that is compounding over time because code is added. Duplication is bad and weird/purist abstraction could make the architecture so rigid that rewriting things could generate hard to understand and catch bugs.
It’s hard to find a good balance and it depends on the kind of business and scale of project. Hard to make that a generic advice.
> Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.
Hard disagree. When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too. A bad abstraction would at least have had one fire in one place.
Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions? In a large enough codebase you'd have both the bad abstractions and all the not-quite-duplicate implementations ignoring the bad abstraction?
I'm using bad here loosely, it could be buggy, incorrect, incomplete, insufficient and more; while being owned by someone or some team that's a challenge to work with for various reasons (overloaded, under-resourced, overbearing, etc., etc.).
> Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions?
Obviously, yes. But it is my experience that this happens more slowly and that API invocations that break when the abstraction is changed are much easier to identify than broader duplicated patterns of code that span many lines and subtly diverge.
And even then those divergences are better because each wrapper around the abstraction is documenting the problem with it. But the abstraction can at generally be replaced by one with the same API surface.
(Even if you take into account the fact that any API behaviour ultimately gets relied upon even if undocumented. Which is true.)
To be fair my experience is that of a freelancer and contractor who arrives trying to fix things that have been through many such hands. And I think if these developers had it drummed into their head that any attempt at abstraction would be better than copy and paste, these situations would be more knowable.
>A bad abstraction would at least have had one fire in one place.
On the contrary: that's precisely what a bad abstraction would not offer.
Instead it would spread its assumptions to different parts of the system, as every caller, sub-service, etc. would have to change shape to fit in that abstraction's box, however unnatural it is (and we know it would be unnatural, because we already said it's a bad abstraction).
A good abstraction? As in one? I'd go so far as to say the process of discovering and refining abstractions is the most important part of software engineering. A large project has dozens of abstractions, and some of them are "wrong" at any time, as you discover over time. None are ever perfect. If you wait to stop duplicating code until you have the "right" abstraction, you are just putting off the hard part of developing software and taking on tech debt.
Half of your abstractions are wrong. The hard part is knowing which half.
What if there is no good abstraction for the entire stack of software on each of computers? What if we built a common one because we had to? What if now we get to all make our own with natural language?
But also it's very possible to not realise you needed an abstraction until it catches fire in multiple places.
And quite often it's not you that got the codebase to a hundred customers, is it? Sometimes it is a sequence of fresh-faced young developers who didn't have the authority to say "this duplication is bullshit" and were instead compelled to repeat it.
I think a lot of these discussions happen in nice little blog-post vacuums of progressive thinking, where people can go "mmm, object oriented coding obscures intent and clarity, mmm", blog posts with "an X is a Y", "the unreasonable effectiveness of foobar" etc.
In the real world, every duplication that works sticks for good; there is rarely budget to electively replace code that isn't broken. Until one day it doesn't work. And then… how many times is it actually duplicated? How many of the duplicates diverged? How many of these do we no longer need?
In my experience, the answer is always "It Depends." That's about the only thing that I can hang "always" on.
It really depends on the exact type of code we're working with, and what our objectives are.
In my case, I often use object inheritance. It's a damn cheap way to DRY. However, when people hear "inheritance," they often think "polymorphism." There's a really big difference between the two, but popular culture has jammed them into one ball, and it's not worth the agita, to try to explain the difference.
But if you are doing optimization, long stacks can be your enemy, and inheritance tends to have long, windy stacks.
In these cases, the copy/pasta method may well be the best approach.
It sounds to me like you are describing a good abstraction. This article does not claim that code duplication is better than any abstraction. It claims that code duplication is better than the wrong abstraction. I'm sure this author would agree that a good abstraction is better than code duplication.
But it’s never going to be 1:1 duplication is it? Sometimes it’s better to copy code as a template for something new, rather than try to immediately force a new abstraction.
I agree with you that it’s a truism, but it’s useful advice for people who have a habit of trying too hard to DRY their code. IIRC the author comes from the Ruby world, where DRY was a big thing, and this talk was part of the pendulum swinging back away from this DRY obsession that sometimes just resulted in convoluted code.
You seem to have experience, I dont mind factoring / unifying logic, when done sensibly with enough history in the trenches. It pains me more whenever a young dev comes in and barks "we must merge these two things!" repeatedly without planning for more than two cases and starting to add more and more boolean variables. Crystal makers. Then the obvious issue comes, the two variants weren't that close and now there's one god class trying to handle all forces in one big state.
I agree that LLMs are naturally anti abstraction machines.. I'm often trying to find way to reverse that.
> I agree that LLMs are naturally anti abstraction machines.. I'm often trying to find way to reverse that.
I am a bit of an LLM cynic but I am trying to learn it all, and I have to say I have spent most time trying to work out: how do you explain how a brown-field codebase actually works, in such a way that the LLM won't pervert it through misunderstanding.
It does encourage you towards the "conventional" coding standard for any new project, because you want to use a pattern that it will have seen in its training set.
But for example there are differences of opinion in how wordpress plugins (which have a very complex control flow) should be structured. LLMs are incredible at knowing how WP works, actually, but what is difficult is explaining how your methodology for a large plugin is going to work.
It is a battle — but a useful one because it can be used for, er, studying the comparative belief systems of the LLMs.
I think you applied this idea into the era of LLMs but consider an abstraction that takes in multiple god structs for branches it may or may not call in the case you are looking at and has a lot of if conditions that explode in combinatory complexity across a deep call chain. Now the bottle neck is that you need to call this function 144 times a second. That is where you start to have clusters of hot code paths where the latency stacks depending on the angle the god structs come in.
Not sure what LLMs do here, I don't vibe code
I am applying it to LLMs on the basis of twenty years of seeing smaller programming shops tie themselves in knots by using duplication to avoid developing an abstraction that would help them because they were unsure of it.
Everyone always thinks duplication is fine when you can bill the modifications by the hour. But they never think to understand that the reason they've had so many employees is that they've turned their change process into firefighting all the different versions of the same code and all these young developers burn out from the sheer anxiety of not knowing where all the little fires are.
I once had to rescue a site that had become a victim of its own popularity, that was written by subcontractors who clearly believed that duplication is better than the wrong abstraction.
Until one day, along came a change — MySQL 4 to MySQL 5 — and a significant duplicated query no longer worked due to its new, proper strictness.
The problem was compounded; not only was the broken pattern in hundreds of places where it had sat, stable and predictable, but the pattern was broken because it, itself, was avoidance of another abstraction that would solve it.
They quit: they said they couldn't and wouldn't fix it. It had always worked how they had done it, and it would have to stay on MySQL 4 (which the hosting provider refused to accommodate).
I don't think it helped that they were severely misguided in their understanding of SQL, but the code had become beholden to duplication and then crippled by a new problem in the duplicated pattern.
I had to first find all the contexts in which that pattern appeared (which required me to spend half a day on a bespoke script) and then work out a new pattern and as few variations of it as possible to fix the duplicated code in each place, because there was no proper budget to rewrite the whole thing. And then I sat at my desk, for days, working through each one, figuring out how to change it to fit the slightly different expression of the pattern.
Even a total bullshit abstraction would have saved that client both time and money. And this is only one of dozens of times I've seen small firms simply duplicate and change code that would later become unmaintainable because of a straw breaking a camel's back.
Again, this is the opposite of what the author argues for, which is waiting for a couple instances before committing to an abstraction. Not duplicating a SQL query across hundreds of places.
I would be curious if the previous coders you're talking about actually cited duplication as a good thing. You seem to be implying they are. But almost every instance I've seen of massive code duplication was just from bad programmers shooting from the hip, not from some ideological stance.
> Again, this is the opposite of what the author argues for, which is waiting for a couple instances before committing to an abstraction. Not duplicating a SQL query across hundreds of places.
Right. But this is a hypothetical, in-a-vacuum situation.
In the real world, your two, three duplicates are in production.
"We really should now de-duplicate this"
"There is not the time or budget, just copy it again; we'll replace all this one day".
> I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are passed a de minimis threshold.
Pretty much everyone arguing for duplication has argued what you are saying, which is wait to see a few instances of it before committing to an abstraction. No one is saying duplicate everything 100 times. So I don't think this discussion was ever iconoclastic.
The point is it sounds all smart and sophisticated and principled in the abstract environment of a code discussion in a blog post.
In the real world, duplication happens in an emergent way, there isn't the time each time to judge whether it's really time to just quietly abstract that code, you may not get the permission, budget or window to do it, and if you don't stop the rot really early you are locked into the pattern.
Huh? If anything having lots of customers makes the argument for duplication stronger. The issue is almost always once you get huge and 5 product teams are trying to achieve 5 different goals by using the same overwrought abstraction instead of just copying and decoupling. The abstractions that are actually stable end up becoming libraries or platform team owned systems that no one ever really touches.
I feel like the balance has shifted over the last 30 years, and is speeding up. Semi-automatic and fully automatic re-factoring has made dealing with duplicated code much faster, cheaper and safer. Changing abstraction is still high risk.
I have regularly watched agents forget to update one duplicated pattern after changing it somewhere else. If it's within a single file or related class, it'll catch it, but if it's off in some other package in the monorepo, it's a crapshoot.
I used to struggle with abstractions back in my OOP days but since moving pretty much to a purely functional approach I find that code duplication is rare. Just have a function and call it in two parts. The main abstraction issue is then data structures but with TypeScript interfaces being duck typing essentially I run into few problems there as well.
So code duplication because of abstraction issues is rare. Code duplication because of siloed developers is so much more common.
For hobby, I use functional languages, and I find the techniques are the important bits to remember. Most modern languages let you easily stand on functional programming theory. You don't need to know Haskell. Everyone's brain works differently, but the idea of small, simple and occasionally flexible parts building a whole works for me. As opposed to the large complex do it all shape shifting machine.
I assume they mean to call the function from two (or more) parts of the code (i.e locations). It's not immediately apparent why this is meaningfully different than what would be possible in Java though, since ostensibly a function is the same as a method by just moving the callee to the list of parameters. (There are some things in a Java method that you can do that don't translate to most functional languages, like invoking the version of the method from a superclass, but there's nothing forcing you to do any of those from the language perspective, so it seems a bit strange to claim that the language itself is the issue rather than maybe the specific patterns that were chosen, maybe by their coworkers or just not common in the ecosystem).
Echoing the article, anyone who has experienced both will agree: it’s far easier to work with an under engineered code base than an over engineered one.
This step should also be parameterized by how many times the duplication has occurred. Refactoring preemptively may lead to poor abstractions, but not refactoring after seeing the exact same thing tens of times would also be weird. See also:
+1 The worst code I had to maintain was code that tried to follow DRY (without the trying to understand what the original intention of that principle was). The only way out of that mess was widespread code duplication.
i recall very early in my career i did exactly this. i took what worked duplicated it—my reasoning being that it was far safer to reuse what has been battle tested and leave refactoring at a later stage
it wasn't received well and senior developer told me that 'good developers know exactly what patterns to use all the time before writing any piece of code and that he will clean up my mess'
long story short his refactoring caused what was otherwise a stable system into a complete mess and it reminded me of Nassim Taleb's book
It's definitely an "it depends" thing. It's easy to overabstract. On the other hand, I've also met junior developers who just didn't know how to use function parameters.
Except 9/10 times microservices end up wildly dependent on each other, yielding a distributed monolith. Better to use service oriented architecture and just ship the monolith, you can test easier and skip the extra layers of serialization / deserialization.
I believe that "single source of truth" is a principle that should always be followed. If there's duplicated code where it'd be a bug if they diverge, then you should refactor. It creates a long-distance coupling in your code that may be invisible to future developers until a bug emerges.
But with that in mind, I mostly agree with the article: if it's not a violation of "single source of truth", then abstractions are just a convenience. If it starts being inconvenient, then it's not doing its job and there's no reason to use it. It's a serious code smell if a function needs several flags for custom behavior; that means it's probably the wrong abstraction or violating the single responsibility principle. If there is a legit need for lots of customization, an often-good way to handle is to take a function/functor as an argument for the customization. E.g., rather than `solve(f:double -> double, max_iters = 99, x_abs_tol = 1e-15, x_rel_tol = 1e-15, ...)` you can do `solve(f:double -> double, stopping_criteria: StoppingCriteriaClass)`
I once used code duplication to implement a fourth type of dialog that looked somewhat similar to the others, that were sharing a lot of code, because I felt that although it looked much the same as the others, there was some fundamental difference. Took me about a day to implement. When some other engineer saw this, he spend the next three weeks trying to integrate all of them with some shared class. His work was not completely worthless, because he did find some small bug during all his efforts to avoid any possible code duplication. I already had predicted that it would take a lot effort, but I did not object, because I hoped that he would learn something from it and the next time think twice before always trying to avoid code duplication.
Two talks come to mind here: Mike Acton's Data-Oriented Design and C++ [1] and
Brian Cantrill's The Complexity of Simplicity [2].
Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions. I can't do the talk justice, but it's had a big impact on me.
Brian's talk is about abstraction generally, and how it's difficult to find the "right" abstraction.
> Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions.
I've always found it odd when even fairly smart engineers sometimes prioritize real-world metaphors over the actual needs of the codebase. Years ago when I was only a few years out of school, I was implementing a connection pool in Rust, and the most reasonable way to implement it was to have the connection hold a weak reference to the pool so that it could get checked back in automatically when dropped. My manager (an extremely experienced engineer) didn't like this idea because "a library holds library books, not the other way around". I didn't feel like this was a compelling reason to design things differently, but he refused to engage with the issue in any way other than through the lens of that metaphor. Eventually the impasse was solved by one of the other managers in my department suggested that while library books don't contain libraries, they do have the name of the library stamped in the back as a reference to where they should be returned, and I guess my manager found this to be a reasonable extension of the analogy. If I were more experienced, maybe I would have recognized that I could find a way to engage with the analogy like the other manager did without ceding the point, but even today I still feel that it was completely bizarre to insist on that as the canonical way to frame things rather than just considering the ramifications of the abstraction in the code and the experience of using the library based on it.
It depends if duplication is accidental or real. I.e. if two taxes are using the same formula, it is accidental. If you use the same physic formula on multipla places, it is real duplication.
2016 (up to 2018 or so) may have been the peak of such varied activity in the developer ecosystem, including articles like this, whether it was discussion, ideation, OSS variety, language development.
There has been growth since but it's been concentrated into fewer channels and somewhat industrialized.
Yes. I’m dealing with a graphql, urql, Next, Prisma stack at the moment. Something that would be a handful of lines of code in a different stack ends up being hundreds in this one.
I don't know about you, but I generally don't write code in a vacuum. Other people may have touched it before me. Those other people may have made poor decisions.
Not that I'm immune from choosing the wrong abstraction sometimes. More than once the "other people" was me. We all make mistakes.
Interface over inheritance is the paradigm I try and stick to. I'd rather maintain orthogonal code than code with overuse of inheritance because of over adherence to DRY.
While I see the point, I think I more often encounter the opposite. Duplication, but not exactly duplication.
Then the "sunk cost fallacy" is not an issue but there is huge maintenance cost and no-one feels like refactoring it. I'd rather refactor bad abstraction than 10x duplication.
but those are exactly the cases where the distinction matters. when you have a situation where you can't duplicate the code exactly, then you really have to look carefully if this is actually the right place for a shared abstraction. i tend to wait and see if i can refactor one or the other to get them to be exact duplicates and only then see if i can fit in a common abstraction. and yes, finding that i later need to make the same change in both places is a sign that a common abstraction is probably the right call.
Triplication tends to be where it becomes more clear what the correct thing to abstract or de-duplicate is.
It's of course possible to functional-ize segments of logic, but then the question of state mutation must be brought up. How isolated are these changes from other parts of the code / system state. Can this be run in parallel or is it something that must be serial? What potential race conditions exist?
I've seen the pendulum swing between duplication and abstraction a few times in my career, and I'm currently on team "it's usually not that hard to find a good abstraction up front."
IMO it's easier to inline a bad abstraction than it is to consolidate a bunch of subtly different things that should have been abstracted from the beginning.
But I expect people's opinions on this differ wildly based on their personal experiences. Just my anecdotal take.
Depends. If the abstraction is just a level of indirection, then it is usually pretty simple to eliminate - just hit “inline function” in the refactoring tool a few times.
On the other hand it is pretty difficult and error prone to consolidate duplicated code which have drifted apart over time.
If in doubt, chose the approach which is simplest and least risk to revert if you discover in the future you made the wrong choice.
I do agree a bad abstraction can cause huge problems. But it’s usually not the kind of abstractions introduced to eliminate code duplication, but the kind of top-down “architecture astronaut” abstractions, where a model is chosen which does not fit the complexity of the problem.
I once had to work with a system that was refactored and abstracted away heavily to use Redux. It didn't work then, the implementation had way too many abstractions, doing any change meant you had to touch dozens of files. It was insanity. Left me with a bitter taste regarding the redux pattern for ever (probably not the pattern's fault).
Over-abstraction is as much of a problem as under-abstraction. If the abstraction isn’t improving your ability to produce good code, it’s a bad abstraction. I’ve worked with a lot of abstraction patterns in a lot of languages over the 30 years of my career. Any of them can be good or bad. Unthinkingly applying them is always a problem.
Unsurprisingly, that goes for just about any idea in software development. I worked in one code base that heard small functions are best, so every function was less than three lines long. You don’t gain anything by replacing `lst.get(0)` with `get_first_item_in_list(lst)` (in fact, understanding becomes much more difficult), but breaking down functions into the smallest units that make sense independently within the business domain can be very helpful, both for understanding and testing.
100% agreed. It's interesting since I see over-abstraction often abused by "clever" engineers (sometimes quite experienced actually). Sometimes I wonder if they do that to make themselves indispensable on purpose and create their silos in the codebase.
If you work backward from the schema these sorts of things tend to evaporate before they can become a problem.
Some of the biggest rabbit holes come from naming conventions not aligning across the business and technology silos. If everyone agrees that Customer has exactly 34 attributes, then it is possible to move to the next step of sharing libraries of types across the team. Getting your POCOs/DTOs 1:1 across the board is when the duplication really starts to melt away.
I watched a talk by her about this, and this post is missing half of the equation, which is really important:
Having a wrong abstraction means you end up with a class/function/module with a huge amount of configurations through boolean/enum parameters. It's not even clear that all combinations of configurations is even valid. This situation may be simplified by duplicating, and then eliminating code, thus creating more streamlined code for each use case. This may require fixing similar or cross-cutting bugs in multiple places (eg: JSON serialization is stupid, need to hack a workaround), but keeps the business logic changes simple. Maybe a bit more numerous, but the code is able to raise all the scenarios to consider.
Having no abstraction means you may have to change business logic consistently in multiple places, or you have to fix exactly the same misconception (aka a bug) in multiple cases. e.g. tax rate management in a multi-national context. This is also terrible, because you may fix an important problem in one place and forget other places with the same issue. Now you missed 12 potential bugs by fixing one. This can however allow you to discover a true abstraction. Maybe these 12 places should call just one place?
But for code evolving across a team understanding this tension, a bit of duplication while waiting for confirmation that these pieces of code break together and change together is better than just shoving the same 3 if-statements into a function to avoid "line duplication". Concept duplication is more important.
The "99 Bottles of OOP" book mentioned at the bottom was an excellent introduction to refactoring. I highly recommend it if you struggle with finding the right data models for the problems you work on.
This is the biggest lesson I got from LMMs. I have a 1 million LOC vibe coded project that I can only imagine would fit in a few hundred thousand lines. But it's still holding up, I expected some kind of development collapse long before this point.
OP is right that code duplication is far cheaper than the wrong abstraction, but the opposite is also true - the right abstraction is far cheaper than code duplication.
It's made me wonder the same, but most LLM generated codebases haven't been around long enough to judge maintainability. I have noticed issues in some of my more LLM heavy code when I expect a change to be replicated in multiple areas, assuming common code / styling was reused, only to find it wasn't. It's for that reason I can't use LLMs for client codebases without heavy scrutiny of every line generated (for my own hobby projects I'm a lot more lenient)
Well sooner or later I would expect a developer who intimately understands their code base to feel compelled to start refactoring and extracting fitting, meaningful well-leveraged abstractions.
> Do you want to iterate using for loop or using .iter().step(2).map()?
This isn't really a good example, assuming both can be used to represent the same thing.
The problem with the wrong abstraction is when your abstraction doesn't let you represent something. Then, because of you've already invested so heavily into it, you start contorting the problem to fit your abstraction and it becomes a shit show.
Code duplication is the wrong abstraction too -- unless it's not really code duplication but code that only happens to be similar for some really "unstable" reason.
I would agree that there are good "de minimis" reasons not to abstract code that isn't ready to be abstracted at all. If the pattern has not settled it shouldn't be forced into an abstraction (beyond those that make sure it is e.g. not vulnerable)
But beyond that, any stable abstraction is better than duplicated code.
I’ve seen code bases that evolved like that. The problem is almost always outside the abstraction that has a pile of conditionals.
Usually, some moron decided to copy paste things a few levels up and then the top half of the system metastasized into two parallel universes of broken garbage.
For instance, one might decide to perform auth later in the flow so unauthorized handlers can run and set a “this requires auth” bit that defaults to false, and the other flow could add a forged auth header before the auth step.
Now, the auth handler needs a “allow forged header” flag and a “already authenticated” flag.
I’ve seen that grow to a half dozen cases until massive production dataloss occurred. A buggy client tried to delete something local to their account without specifying a userid as a parameter (this codebase was garbage!) and deleted the something for all users instead.
I can’t remember how the dataloss was “fixed”, but it definitely wasn’t “all requests go through a simple auth check, and all handlers declare/implement their auth requirements in the same way”.
Getting a design approved to require a user id be specified exactly once for account-level operations was fantasy land for that team. (Most hires with any sort of engineering talent bounced in under a year.)
Anyway the “abstractions are hard so copy paste” approach did provide job security for the lifers on that product. I can’t imagine them holding a job elsewhere, but they were completely immune to layoffs (hostage style).
This is a pretty valid approach if you’re an agent hired to perform industrial sabotage, or if you keep replacing keyboards after you knaw through the corner.
> Code duplication is far cheaper than the wrong abstraction
Very true in some sense, but I continue to encourage DRY-bias because I've literally never seen teams duplicate code responsibly and later dedupe it when it's the right time. 95% of the time this sentiment is quoted to justify shipping quick slop and stable reusable bits are never extracted into a shared lib later.
In my experience if your organization can't commit to doing WET (write everything twice) code then it probably also will fail at doing DRY (don't repeat yourself) code
Maybe this is an area where AI can help identify duplicate code though to show opportunities for de-duping
I prefer the go mantra: a little copying is better than a little dependency.
Abstraction is a vague term when used here. Is a shared function an “abstraction”? It’s more like implementation hiding, maybe some data hiding. But you definitely have a dependency on it now.
Acronyms like DRY are for beginners. Once you get good you know when to break the “rules” (and when not to).
No it's not. This has always been a needlessly iconoclastic rather than sensible suggestion.
At the very least it is not once you're working at the wrong kind of scale.
Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace.
And in the LLM era the wrong kind of scale appears in different ways; code generated and duplicated without proper abstraction and then maintained by an LLM that cannot be trusted to do the same modification each time it encounters a pattern or to have enough of an overview to slowly rescue duplicated code through good abstractions.
I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are past a de minimis threshold.
Hardly iconoclastic, it's a very sensible suggestion.
It would be iconoclastic if the common sense basic approach would be to start with abstraction. It's not, the common sense default is to write possibly duplicate behavior until you actually discover several cases to abstract away, until you bevalop a sensible idea of which functionality unites them and which doesn't carry over all of them.
>Once you have an awkward number of customers (more than five and less than a hundred), maintaining duplicated code that should have been abstracted and modularised will only seem cheap if you don't mind that you burn through even junior employees at a pace
Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.
At work there’s been a huge number of duplication in the start of the company and no solid abstraction. So no tests as well. We introduced tests in the current architecture but rewriting code has a huge cost to make sure there is no regression. When we talk about a saas it’s non-trivial with many customers relying on this tool daily as part of their workflow, regressions because of rewrite could be really painful for them. So we must give a greater budget to take the time to make sure nothing major breaks. So there is a debt that is compounding over time because code is added. Duplication is bad and weird/purist abstraction could make the architecture so rigid that rewriting things could generate hard to understand and catch bugs. It’s hard to find a good balance and it depends on the kind of business and scale of project. Hard to make that a generic advice.
"It’s hard to find a good balance and it depends on the kind of business and scale of project".
Exactly. The abstraction purists are not working in the messy, dead line driven real world.
> Maintaining the wrong abstraction, or, god help, abstractions, would be even worse.
Hard disagree. When you've had to chase through a change in untold and actually unknown numbers of duplications of code in different permutations and fix them because they are all on fire simultaneously, you'd disagree too. A bad abstraction would at least have had one fire in one place.
Good faith question: would it?
Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions? In a large enough codebase you'd have both the bad abstractions and all the not-quite-duplicate implementations ignoring the bad abstraction?
I'm using bad here loosely, it could be buggy, incorrect, incomplete, insufficient and more; while being owned by someone or some team that's a challenge to work with for various reasons (overloaded, under-resourced, overbearing, etc., etc.).
> Wouldn't most large codebases with poor abstractions just have engineers engineer around them with their own solutions?
Obviously, yes. But it is my experience that this happens more slowly and that API invocations that break when the abstraction is changed are much easier to identify than broader duplicated patterns of code that span many lines and subtly diverge.
And even then those divergences are better because each wrapper around the abstraction is documenting the problem with it. But the abstraction can at generally be replaced by one with the same API surface.
(Even if you take into account the fact that any API behaviour ultimately gets relied upon even if undocumented. Which is true.)
To be fair my experience is that of a freelancer and contractor who arrives trying to fix things that have been through many such hands. And I think if these developers had it drummed into their head that any attempt at abstraction would be better than copy and paste, these situations would be more knowable.
>A bad abstraction would at least have had one fire in one place.
On the contrary: that's precisely what a bad abstraction would not offer.
Instead it would spread its assumptions to different parts of the system, as every caller, sub-service, etc. would have to change shape to fit in that abstraction's box, however unnatural it is (and we know it would be unnatural, because we already said it's a bad abstraction).
Abstraction is not the same as encapsulation.
In your mind, what's the cost of the wrong abstraction?
Yeah, "Write Everything Twice" is a pretty common and sensible direction for any codebase
Yeah, ~"Write Everything Twice"~ “Copy and Paste Working Code” is a pretty common and sensible direction for any codebase
Code duplication is cheaper than the wrong abstraction. If you have a good abstraction, you should run with it.
If you haven't figured out a good abstraction at 5-100 customers, God help you.
A good abstraction? As in one? I'd go so far as to say the process of discovering and refining abstractions is the most important part of software engineering. A large project has dozens of abstractions, and some of them are "wrong" at any time, as you discover over time. None are ever perfect. If you wait to stop duplicating code until you have the "right" abstraction, you are just putting off the hard part of developing software and taking on tech debt.
Half of your abstractions are wrong. The hard part is knowing which half.
What if there is no good abstraction for the entire stack of software on each of computers? What if we built a common one because we had to? What if now we get to all make our own with natural language?
I disagree.
But also it's very possible to not realise you needed an abstraction until it catches fire in multiple places.
And quite often it's not you that got the codebase to a hundred customers, is it? Sometimes it is a sequence of fresh-faced young developers who didn't have the authority to say "this duplication is bullshit" and were instead compelled to repeat it.
I think a lot of these discussions happen in nice little blog-post vacuums of progressive thinking, where people can go "mmm, object oriented coding obscures intent and clarity, mmm", blog posts with "an X is a Y", "the unreasonable effectiveness of foobar" etc.
In the real world, every duplication that works sticks for good; there is rarely budget to electively replace code that isn't broken. Until one day it doesn't work. And then… how many times is it actually duplicated? How many of the duplicates diverged? How many of these do we no longer need?
> I disagree.
So... the wrong abstraction, no matter how bad, is better than code duplication?
In my experience, the answer is always "It Depends." That's about the only thing that I can hang "always" on.
It really depends on the exact type of code we're working with, and what our objectives are.
In my case, I often use object inheritance. It's a damn cheap way to DRY. However, when people hear "inheritance," they often think "polymorphism." There's a really big difference between the two, but popular culture has jammed them into one ball, and it's not worth the agita, to try to explain the difference.
But if you are doing optimization, long stacks can be your enemy, and inheritance tends to have long, windy stacks.
In these cases, the copy/pasta method may well be the best approach.
Like I said, "It Depends."
It sounds to me like you are describing a good abstraction. This article does not claim that code duplication is better than any abstraction. It claims that code duplication is better than the wrong abstraction. I'm sure this author would agree that a good abstraction is better than code duplication.
I'm afraid this comment reads in a rather gnomic way.
Of course it's a truism if you just say any abstraction that works is a good abstraction.
That is not what I am saying at all. Bullshit abstractions at least let you control the problem. Duplication doesn't.
But it’s never going to be 1:1 duplication is it? Sometimes it’s better to copy code as a template for something new, rather than try to immediately force a new abstraction.
I agree with you that it’s a truism, but it’s useful advice for people who have a habit of trying too hard to DRY their code. IIRC the author comes from the Ruby world, where DRY was a big thing, and this talk was part of the pendulum swinging back away from this DRY obsession that sometimes just resulted in convoluted code.
You seem to have experience, I dont mind factoring / unifying logic, when done sensibly with enough history in the trenches. It pains me more whenever a young dev comes in and barks "we must merge these two things!" repeatedly without planning for more than two cases and starting to add more and more boolean variables. Crystal makers. Then the obvious issue comes, the two variants weren't that close and now there's one god class trying to handle all forces in one big state.
I agree that LLMs are naturally anti abstraction machines.. I'm often trying to find way to reverse that.
> I agree that LLMs are naturally anti abstraction machines.. I'm often trying to find way to reverse that.
I am a bit of an LLM cynic but I am trying to learn it all, and I have to say I have spent most time trying to work out: how do you explain how a brown-field codebase actually works, in such a way that the LLM won't pervert it through misunderstanding.
It does encourage you towards the "conventional" coding standard for any new project, because you want to use a pattern that it will have seen in its training set.
But for example there are differences of opinion in how wordpress plugins (which have a very complex control flow) should be structured. LLMs are incredible at knowing how WP works, actually, but what is difficult is explaining how your methodology for a large plugin is going to work.
It is a battle — but a useful one because it can be used for, er, studying the comparative belief systems of the LLMs.
They don’t have a useful belief system, one of the rookie mistakes of using LLMs is asking them what you “should” do
So you centralize 3 liners?
Over-engineering and "abstraction hell" are very much not iconoclastic concepts
I think you applied this idea into the era of LLMs but consider an abstraction that takes in multiple god structs for branches it may or may not call in the case you are looking at and has a lot of if conditions that explode in combinatory complexity across a deep call chain. Now the bottle neck is that you need to call this function 144 times a second. That is where you start to have clusters of hot code paths where the latency stacks depending on the angle the god structs come in. Not sure what LLMs do here, I don't vibe code
I am applying it to LLMs on the basis of twenty years of seeing smaller programming shops tie themselves in knots by using duplication to avoid developing an abstraction that would help them because they were unsure of it.
Everyone always thinks duplication is fine when you can bill the modifications by the hour. But they never think to understand that the reason they've had so many employees is that they've turned their change process into firefighting all the different versions of the same code and all these young developers burn out from the sheer anxiety of not knowing where all the little fires are.
I once had to rescue a site that had become a victim of its own popularity, that was written by subcontractors who clearly believed that duplication is better than the wrong abstraction.
Until one day, along came a change — MySQL 4 to MySQL 5 — and a significant duplicated query no longer worked due to its new, proper strictness.
The problem was compounded; not only was the broken pattern in hundreds of places where it had sat, stable and predictable, but the pattern was broken because it, itself, was avoidance of another abstraction that would solve it.
They quit: they said they couldn't and wouldn't fix it. It had always worked how they had done it, and it would have to stay on MySQL 4 (which the hosting provider refused to accommodate).
I don't think it helped that they were severely misguided in their understanding of SQL, but the code had become beholden to duplication and then crippled by a new problem in the duplicated pattern.
I had to first find all the contexts in which that pattern appeared (which required me to spend half a day on a bespoke script) and then work out a new pattern and as few variations of it as possible to fix the duplicated code in each place, because there was no proper budget to rewrite the whole thing. And then I sat at my desk, for days, working through each one, figuring out how to change it to fit the slightly different expression of the pattern.
Even a total bullshit abstraction would have saved that client both time and money. And this is only one of dozens of times I've seen small firms simply duplicate and change code that would later become unmaintainable because of a straw breaking a camel's back.
Again, this is the opposite of what the author argues for, which is waiting for a couple instances before committing to an abstraction. Not duplicating a SQL query across hundreds of places.
I would be curious if the previous coders you're talking about actually cited duplication as a good thing. You seem to be implying they are. But almost every instance I've seen of massive code duplication was just from bad programmers shooting from the hip, not from some ideological stance.
> Again, this is the opposite of what the author argues for, which is waiting for a couple instances before committing to an abstraction. Not duplicating a SQL query across hundreds of places.
Right. But this is a hypothetical, in-a-vacuum situation.
In the real world, your two, three duplicates are in production.
"We really should now de-duplicate this"
"There is not the time or budget, just copy it again; we'll replace all this one day".
> I would go as far as to say that any abstraction you can maintain (that is in active maintenance, I mean) is better than code duplication once you are passed a de minimis threshold.
Pretty much everyone arguing for duplication has argued what you are saying, which is wait to see a few instances of it before committing to an abstraction. No one is saying duplicate everything 100 times. So I don't think this discussion was ever iconoclastic.
The point is it sounds all smart and sophisticated and principled in the abstract environment of a code discussion in a blog post.
In the real world, duplication happens in an emergent way, there isn't the time each time to judge whether it's really time to just quietly abstract that code, you may not get the permission, budget or window to do it, and if you don't stop the rot really early you are locked into the pattern.
Huh? If anything having lots of customers makes the argument for duplication stronger. The issue is almost always once you get huge and 5 product teams are trying to achieve 5 different goals by using the same overwrought abstraction instead of just copying and decoupling. The abstractions that are actually stable end up becoming libraries or platform team owned systems that no one ever really touches.
Too many abstractions are bad. Too many code duplication is bad.
Part of being a good engineer is finding the right balance.
I know engineers who would gladly duplicate code all over the code base to avoid creating a new abstraction.
I know engineers who create polymorphic abstractions for a single caller with a very obvious set of parameters.
So much of wisdom is in finding balance and not being dogmatic about rules.
I feel like the balance has shifted over the last 30 years, and is speeding up. Semi-automatic and fully automatic re-factoring has made dealing with duplicated code much faster, cheaper and safer. Changing abstraction is still high risk.
I have regularly watched agents forget to update one duplicated pattern after changing it somewhere else. If it's within a single file or related class, it'll catch it, but if it's off in some other package in the monorepo, it's a crapshoot.
Changing abstraction is a high risk unlike agents refactoring scores of almost identical code.
I thought this discussion was limited to situations where you care about code quality
I used to struggle with abstractions back in my OOP days but since moving pretty much to a purely functional approach I find that code duplication is rare. Just have a function and call it in two parts. The main abstraction issue is then data structures but with TypeScript interfaces being duck typing essentially I run into few problems there as well.
So code duplication because of abstraction issues is rare. Code duplication because of siloed developers is so much more common.
For hobby, I use functional languages, and I find the techniques are the important bits to remember. Most modern languages let you easily stand on functional programming theory. You don't need to know Haskell. Everyone's brain works differently, but the idea of small, simple and occasionally flexible parts building a whole works for me. As opposed to the large complex do it all shape shifting machine.
what exactly is 'calling a function in two parts'
I assume they mean to call the function from two (or more) parts of the code (i.e locations). It's not immediately apparent why this is meaningfully different than what would be possible in Java though, since ostensibly a function is the same as a method by just moving the callee to the list of parameters. (There are some things in a Java method that you can do that don't translate to most functional languages, like invoking the version of the method from a superclass, but there's nothing forcing you to do any of those from the language perspective, so it seems a bit strange to claim that the language itself is the issue rather than maybe the specific patterns that were chosen, maybe by their coworkers or just not common in the ecosystem).
I read it as „calling it from two places“
I believe they’re referring to callbacks / dependency injection / higher order functions to customize the behavior of a function?
Echoing the article, anyone who has experienced both will agree: it’s far easier to work with an under engineered code base than an over engineered one.
> Programmer A sees duplication.
This step should also be parameterized by how many times the duplication has occurred. Refactoring preemptively may lead to poor abstractions, but not refactoring after seeing the exact same thing tens of times would also be weird. See also:
https://wiki.c2.com/?DuplicationRefactoringThreshold
https://wiki.c2.com/?ThreeStrikesAndYouRefactor
+1 The worst code I had to maintain was code that tried to follow DRY (without the trying to understand what the original intention of that principle was). The only way out of that mess was widespread code duplication.
i recall very early in my career i did exactly this. i took what worked duplicated it—my reasoning being that it was far safer to reuse what has been battle tested and leave refactoring at a later stage
it wasn't received well and senior developer told me that 'good developers know exactly what patterns to use all the time before writing any piece of code and that he will clean up my mess'
long story short his refactoring caused what was otherwise a stable system into a complete mess and it reminded me of Nassim Taleb's book
It's definitely an "it depends" thing. It's easy to overabstract. On the other hand, I've also met junior developers who just didn't know how to use function parameters.
You can do both with microservices!
Please, stop it
Except 9/10 times microservices end up wildly dependent on each other, yielding a distributed monolith. Better to use service oriented architecture and just ship the monolith, you can test easier and skip the extra layers of serialization / deserialization.
I think you missed GP's point
But wait! There's more!
For $19.95, you can replace your single single point of failure with multiple single points of failure!
Or for 100$, get a 5x increase on all failure points - maximum vibes, maximum excitement.
Nobody wants to listen. Nobody. In 90% of the companies there are some so called senior devs that get ecstatic when they create a new abstraction.
Overengineering, abstractions and premature optimisation are the 3 worst plagues of engineering.
At the same time I’m happy they exist because it means we’ll always have a job.
I believe that "single source of truth" is a principle that should always be followed. If there's duplicated code where it'd be a bug if they diverge, then you should refactor. It creates a long-distance coupling in your code that may be invisible to future developers until a bug emerges.
But with that in mind, I mostly agree with the article: if it's not a violation of "single source of truth", then abstractions are just a convenience. If it starts being inconvenient, then it's not doing its job and there's no reason to use it. It's a serious code smell if a function needs several flags for custom behavior; that means it's probably the wrong abstraction or violating the single responsibility principle. If there is a legit need for lots of customization, an often-good way to handle is to take a function/functor as an argument for the customization. E.g., rather than `solve(f:double -> double, max_iters = 99, x_abs_tol = 1e-15, x_rel_tol = 1e-15, ...)` you can do `solve(f:double -> double, stopping_criteria: StoppingCriteriaClass)`
I once used code duplication to implement a fourth type of dialog that looked somewhat similar to the others, that were sharing a lot of code, because I felt that although it looked much the same as the others, there was some fundamental difference. Took me about a day to implement. When some other engineer saw this, he spend the next three weeks trying to integrate all of them with some shared class. His work was not completely worthless, because he did find some small bug during all his efforts to avoid any possible code duplication. I already had predicted that it would take a lot effort, but I did not object, because I hoped that he would learn something from it and the next time think twice before always trying to avoid code duplication.
Two talks come to mind here: Mike Acton's Data-Oriented Design and C++ [1] and Brian Cantrill's The Complexity of Simplicity [2].
Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions. I can't do the talk justice, but it's had a big impact on me.
Brian's talk is about abstraction generally, and how it's difficult to find the "right" abstraction.
1. https://www.youtube.com/watch?v=rX0ItVEVjHc
2. https://www.youtube.com/watch?v=Cum5uN2634o
> Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions.
I've always found it odd when even fairly smart engineers sometimes prioritize real-world metaphors over the actual needs of the codebase. Years ago when I was only a few years out of school, I was implementing a connection pool in Rust, and the most reasonable way to implement it was to have the connection hold a weak reference to the pool so that it could get checked back in automatically when dropped. My manager (an extremely experienced engineer) didn't like this idea because "a library holds library books, not the other way around". I didn't feel like this was a compelling reason to design things differently, but he refused to engage with the issue in any way other than through the lens of that metaphor. Eventually the impasse was solved by one of the other managers in my department suggested that while library books don't contain libraries, they do have the name of the library stamped in the back as a reference to where they should be returned, and I guess my manager found this to be a reasonable extension of the analogy. If I were more experienced, maybe I would have recognized that I could find a way to engage with the analogy like the other manager did without ceding the point, but even today I still feel that it was completely bizarre to insist on that as the canonical way to frame things rather than just considering the ramifications of the abstraction in the code and the experience of using the library based on it.
It depends if duplication is accidental or real. I.e. if two taxes are using the same formula, it is accidental. If you use the same physic formula on multipla places, it is real duplication.
2016 (up to 2018 or so) may have been the peak of such varied activity in the developer ecosystem, including articles like this, whether it was discussion, ideation, OSS variety, language development.
There has been growth since but it's been concentrated into fewer channels and somewhat industrialized.
Duplication is often a small price to pay for isolation
Yes. I’m dealing with a graphql, urql, Next, Prisma stack at the moment. Something that would be a handful of lines of code in a different stack ends up being hundreds in this one.
The Node ecosystem is full of wrong abstractions.
The problem is self-inflicted. You do not need to keep jumping to the next trendy framework.
I inherited this one. My preferred SPA stack these days involves Porsager’s Postgres library, a simple RPC stack with Zod schemas, and Preact.
Even better is an old school MPA with progressive enhancement.
I don't know about you, but I generally don't write code in a vacuum. Other people may have touched it before me. Those other people may have made poor decisions.
Not that I'm immune from choosing the wrong abstraction sometimes. More than once the "other people" was me. We all make mistakes.
Of course, but we should all be doing our best to push back against unnecessary framework churn.
if the majority of the team agrees, sure. but if i am in the minority then i'll appear uncooperative, and that may not be a position i want to be in.
Do you want us to call the previous company and explain what their framework choices did?
my paycheck needs me to.
Nextjs in particular is a dumpster fire, it's a shame that it's the default stack many LLMs slop out.
Interface over inheritance is the paradigm I try and stick to. I'd rather maintain orthogonal code than code with overuse of inheritance because of over adherence to DRY.
While I see the point, I think I more often encounter the opposite. Duplication, but not exactly duplication. Then the "sunk cost fallacy" is not an issue but there is huge maintenance cost and no-one feels like refactoring it. I'd rather refactor bad abstraction than 10x duplication.
but those are exactly the cases where the distinction matters. when you have a situation where you can't duplicate the code exactly, then you really have to look carefully if this is actually the right place for a shared abstraction. i tend to wait and see if i can refactor one or the other to get them to be exact duplicates and only then see if i can fit in a common abstraction. and yes, finding that i later need to make the same change in both places is a sign that a common abstraction is probably the right call.
How I see this:
Refactoring code to reduce the number of lines is _compression_, akin to RLE coding.
Refactoring the code to lift conceptually coherent parts is _abstraction_.
Less compression, more abstraction. Then you're fine.
Duplication is fine, triplication and above is the issue.
Triplication tends to be where it becomes more clear what the correct thing to abstract or de-duplicate is.
It's of course possible to functional-ize segments of logic, but then the question of state mutation must be brought up. How isolated are these changes from other parts of the code / system state. Can this be run in parallel or is it something that must be serial? What potential race conditions exist?
I've seen the pendulum swing between duplication and abstraction a few times in my career, and I'm currently on team "it's usually not that hard to find a good abstraction up front."
IMO it's easier to inline a bad abstraction than it is to consolidate a bunch of subtly different things that should have been abstracted from the beginning.
But I expect people's opinions on this differ wildly based on their personal experiences. Just my anecdotal take.
Depends. If the abstraction is just a level of indirection, then it is usually pretty simple to eliminate - just hit “inline function” in the refactoring tool a few times.
On the other hand it is pretty difficult and error prone to consolidate duplicated code which have drifted apart over time.
If in doubt, chose the approach which is simplest and least risk to revert if you discover in the future you made the wrong choice.
I do agree a bad abstraction can cause huge problems. But it’s usually not the kind of abstractions introduced to eliminate code duplication, but the kind of top-down “architecture astronaut” abstractions, where a model is chosen which does not fit the complexity of the problem.
The discussion around this topic would be nicer if the title had "can be" instead of "is".
Otherwise what is better is better and we don't know what we don't know
I once had to work with a system that was refactored and abstracted away heavily to use Redux. It didn't work then, the implementation had way too many abstractions, doing any change meant you had to touch dozens of files. It was insanity. Left me with a bitter taste regarding the redux pattern for ever (probably not the pattern's fault).
Over-abstraction is as much of a problem as under-abstraction. If the abstraction isn’t improving your ability to produce good code, it’s a bad abstraction. I’ve worked with a lot of abstraction patterns in a lot of languages over the 30 years of my career. Any of them can be good or bad. Unthinkingly applying them is always a problem.
Unsurprisingly, that goes for just about any idea in software development. I worked in one code base that heard small functions are best, so every function was less than three lines long. You don’t gain anything by replacing `lst.get(0)` with `get_first_item_in_list(lst)` (in fact, understanding becomes much more difficult), but breaking down functions into the smallest units that make sense independently within the business domain can be very helpful, both for understanding and testing.
100% agreed. It's interesting since I see over-abstraction often abused by "clever" engineers (sometimes quite experienced actually). Sometimes I wonder if they do that to make themselves indispensable on purpose and create their silos in the codebase.
Twice a coincidence, thrice a pattern.
If you work backward from the schema these sorts of things tend to evaporate before they can become a problem.
Some of the biggest rabbit holes come from naming conventions not aligning across the business and technology silos. If everyone agrees that Customer has exactly 34 attributes, then it is possible to move to the next step of sharing libraries of types across the team. Getting your POCOs/DTOs 1:1 across the board is when the duplication really starts to melt away.
I watched a talk by her about this, and this post is missing half of the equation, which is really important:
Having a wrong abstraction means you end up with a class/function/module with a huge amount of configurations through boolean/enum parameters. It's not even clear that all combinations of configurations is even valid. This situation may be simplified by duplicating, and then eliminating code, thus creating more streamlined code for each use case. This may require fixing similar or cross-cutting bugs in multiple places (eg: JSON serialization is stupid, need to hack a workaround), but keeps the business logic changes simple. Maybe a bit more numerous, but the code is able to raise all the scenarios to consider.
Having no abstraction means you may have to change business logic consistently in multiple places, or you have to fix exactly the same misconception (aka a bug) in multiple cases. e.g. tax rate management in a multi-national context. This is also terrible, because you may fix an important problem in one place and forget other places with the same issue. Now you missed 12 potential bugs by fixing one. This can however allow you to discover a true abstraction. Maybe these 12 places should call just one place?
But for code evolving across a team understanding this tension, a bit of duplication while waiting for confirmation that these pieces of code break together and change together is better than just shoving the same 3 if-statements into a function to avoid "line duplication". Concept duplication is more important.
If it's duplication, it's the same abstraction by definition. The fundamental unit of programming is intent, not code.
The "99 Bottles of OOP" book mentioned at the bottom was an excellent introduction to refactoring. I highly recommend it if you struggle with finding the right data models for the problems you work on.
Yes, if your programming language/environment is weak.
This is the biggest lesson I got from LMMs. I have a 1 million LOC vibe coded project that I can only imagine would fit in a few hundred thousand lines. But it's still holding up, I expected some kind of development collapse long before this point.
I don't think that's a good lesson.
OP is right that code duplication is far cheaper than the wrong abstraction, but the opposite is also true - the right abstraction is far cheaper than code duplication.
It's made me wonder the same, but most LLM generated codebases haven't been around long enough to judge maintainability. I have noticed issues in some of my more LLM heavy code when I expect a change to be replicated in multiple areas, assuming common code / styling was reused, only to find it wasn't. It's for that reason I can't use LLMs for client codebases without heavy scrutiny of every line generated (for my own hobby projects I'm a lot more lenient)
Well sooner or later I would expect a developer who intimately understands their code base to feel compelled to start refactoring and extracting fitting, meaningful well-leveraged abstractions.
The problem with coming up with a rule that works for everyone is that everyone have a different idea of what makes a good abstraction.
Do you want to iterate using for loop or using .iter().step(2).map()?
I would rather have consistency than a mixed bag of levels of abstractions.
> Do you want to iterate using for loop or using .iter().step(2).map()?
This isn't really a good example, assuming both can be used to represent the same thing.
The problem with the wrong abstraction is when your abstraction doesn't let you represent something. Then, because of you've already invested so heavily into it, you start contorting the problem to fit your abstraction and it becomes a shit show.
> Do you want to iterate using for loop or using .iter().step(2).map()?
I don’t think it matters, specially for sort sized loop scopes
Code duplication is the wrong abstraction too -- unless it's not really code duplication but code that only happens to be similar for some really "unstable" reason.
I would agree that there are good "de minimis" reasons not to abstract code that isn't ready to be abstracted at all. If the pattern has not settled it shouldn't be forced into an abstraction (beyond those that make sure it is e.g. not vulnerable)
But beyond that, any stable abstraction is better than duplicated code.
The smallest amount of simple code that solves the problem wins. Everything else is irrelevant.
I’ve seen code bases that evolved like that. The problem is almost always outside the abstraction that has a pile of conditionals.
Usually, some moron decided to copy paste things a few levels up and then the top half of the system metastasized into two parallel universes of broken garbage.
For instance, one might decide to perform auth later in the flow so unauthorized handlers can run and set a “this requires auth” bit that defaults to false, and the other flow could add a forged auth header before the auth step.
Now, the auth handler needs a “allow forged header” flag and a “already authenticated” flag.
I’ve seen that grow to a half dozen cases until massive production dataloss occurred. A buggy client tried to delete something local to their account without specifying a userid as a parameter (this codebase was garbage!) and deleted the something for all users instead.
I can’t remember how the dataloss was “fixed”, but it definitely wasn’t “all requests go through a simple auth check, and all handlers declare/implement their auth requirements in the same way”.
Getting a design approved to require a user id be specified exactly once for account-level operations was fantasy land for that team. (Most hires with any sort of engineering talent bounced in under a year.)
Anyway the “abstractions are hard so copy paste” approach did provide job security for the lifers on that product. I can’t imagine them holding a job elsewhere, but they were completely immune to layoffs (hostage style).
This is a pretty valid approach if you’re an agent hired to perform industrial sabotage, or if you keep replacing keyboards after you knaw through the corner.
> Code duplication is far cheaper than the wrong abstraction
Very true in some sense, but I continue to encourage DRY-bias because I've literally never seen teams duplicate code responsibly and later dedupe it when it's the right time. 95% of the time this sentiment is quoted to justify shipping quick slop and stable reusable bits are never extracted into a shared lib later.
In my experience if your organization can't commit to doing WET (write everything twice) code then it probably also will fail at doing DRY (don't repeat yourself) code
Maybe this is an area where AI can help identify duplicate code though to show opportunities for de-duping
(2016)
Some previous discussions:
2023 https://news.ycombinator.com/item?id=35927149
2021 https://news.ycombinator.com/item?id=27095503
2020 https://news.ycombinator.com/item?id=23739596
2018 https://news.ycombinator.com/item?id=17578714
2016 https://news.ycombinator.com/item?id=11032296
I prefer the go mantra: a little copying is better than a little dependency.
Abstraction is a vague term when used here. Is a shared function an “abstraction”? It’s more like implementation hiding, maybe some data hiding. But you definitely have a dependency on it now.
Acronyms like DRY are for beginners. Once you get good you know when to break the “rules” (and when not to).
Oh the self-contradiction here...
Generalizing this in the abstract is a wrong abstraction.
No it's not.