Hacker News new | past | comments | ask | show | jobs | submit login

That can be boiled down to the “Rule of 3”.

My CTO often asks me to implement a feature to do X and make it “generic enough to handle future use cases”. My answer is always the same - either give me at least three use cases now or I am going to make it work with this one use case. If we have another client that needs the feature in the future then we will revisit it.

Of course, there are some features that we know in advance based on the industry how we can genericize it.

From (I think) an old Joshua Bloch talk on API design, paraphrased:

* If you generalise based on one example, you will get a flexible API that can handle only that example. * If you generalise based on two examples, you will get a flexible API that can switch between those two examples. * If you generalise based on three examples, you have a chance of abstracting over the common essence.

Hmm. I wonder if he got that from Simon while he (JB) was at CMU. He (HS) once said to me, jokingly, I think, “One makes an observation, two makes a generalization, three makes a proof”.

Sounds good but is wrong. Sometimes you need more than 3. Maybe 3 is for those who need to move fast and break things though.

Yeah. I think he (HS) was joking. He had a Nobel, and (co)invented AI. I’m pretty sure he knew what a proof is.

Yes, he mentions it in Effective Java. When designing APIs, collaborate with 3 clients if possible.

Until an example that “doesn’t fit” ?

And then if the example “doesn’t fit”. It’s by definition a different thing than you originally modeled

The Rule of 3 is a great rule, except when it isn't.

I had a colleague some time ago who wrote a couple of data importers for FAA airspace boundaries. There were two data feeds we cared about, "class airspace" and "special use airspace". These airspace feeds have nearly identical formats, with altitudes, detailed boundary definitions, and such. There are a few minor differences between the two, for example different instructions for when a special use airspace may be active. But they are about 95% the same.

The developer wrote completely separate data definitions and code for the two. The data definitions mostly looked the same and used the same names for corresponding fields. And the code was also nearly the same between the two (in fact exactly the same for most of it).

It was clear that one importer was written first, and then the code and data structures were copied and pasted and updated in minor ways to create the second.

Because the data structures were unique for each (even if they looked very similar in the source code), this impacted all the downstream code that used this data. If you saw a field called FAA_AIRSPACE_MIN_ALTITUDE, you had be sure to not confuse the class airspace vs. special use airspace, because each of these had a field of the same name, the compiler wouldn't catch you if you used the wrong one, and you may have the wrong offset into the data structure.

I asked the developer and they told me, "I have this philosophy that says when you have only two of something, it's better to just copy and paste the code, but of course when you get to the third one you want to start to think about refactoring and combining them."

Yep, the Rule of 3.

In this case there were only ever going to be two. And the two were nearly the same with only minor differences.

But because of blind obedience to the Rule of 3, there were many thousands of lines of code duplicated, both in the importer and in its downstream clients.

I still like the Rule of 3 as a general principle, and I have espoused it myself. But I think it is best applied in cases where the similarities are less clear, where it seems that there may be something that could be refactored and combined, but it's not yet clear what the similarities are.

I think it is a very bad rule in a case like this, where it should be obvious from the beginning that there are many more similarities than differences.

Every single rule or advice in programming is good until it isn't. OOP is good until it isn't, function programming is good until it isn't, premature optimization is the root of all evil until it is the root of all good.

For some reasons humans have this deep need to try and boil things down to bulleted lists which in the domain of programming are just incredibly not useful.

This is because programming is an art. It has fundamental components (objects, functions, templates, iterators, etc) like in visual design (point, line, shape, value, etc).

I think engineers should read the old and the new C++ books and master that language to know the evolution of all these paradigms and how to use them. There’s so much wisdom in the “Effective C++” series and Gang of Four and “C++ Templates: The complete guide“ to name a few.

Problem is in this “start up culture” to bang things out and get them working the art is left behind. Just like many other arts.

I was part of an uncredited team interviewed by Meyers for "Effective Modern C++". Always thought ir was somewhat ironic. At the firm (declined acknowledgments because they shun media attention) we werent even using a C++11 compiler on either Linux or Windows at the time. Yet, at least two of the patterns, I recognize as being at least a partial contributor to.

Myself, I lost track of C++ standards changes with C++17, and Ive not been using C++ for the last several years.

I still love the power and speed, but right now I'm dping more ETL work, and Python is a better and more productive language for that.

I think that you have a point but I often find myself citing guidelines or rules when I am evaluating code decisions or questioning code design. Maybe it depends on your interpretation of the phrases, some sayings should be followed religiously but others applied with discretion.

well said. for a while I started treating everything as a soft rule, more like guideline. it gets easier then :)

Consider them to be heuristics, not rules.

The rule of 3 usually is in reference to small scoped abstractions, not whole modules or subsystems. We're talking about extracting a short function, not significant and potentially thorny chunks of code.

But I guess no one explicitly spells this out, so I could see where someone could become confused.

Premature abstraction. Rule of 3 helps. But I found better principle for it:

Any abstraction MUST be designed to be as close as possible to be language-primitive-like. Language primitives are reliable, predictable, and non-breaking. If they do, they don't affect business logic written on top of it. If parameters are added, defaults are provided. They don't just abstract, they enable developers to express business logic more elegantly.

The challenge is to pick the part of the code abstractable to be primitive-like first and make it a top priority.

This is why language features like Rust async, Go channel-based comm, ES6, C++ smart pointer was such hype in their time and is used up until now. It also applies to enabling tools such as React, tokio, wasm-bindgen, express, TypeScript, jquery (even this which is not a thing anymore).

I find abstractable parts in code by thinking about potential names for it. If there is no good name for a potential common function, it's probably not a good idea to extract the section into a function. Maybe it can be extracted into two separate functions with good names?

This is a proxy for having a well-defined purpose for a function.

Yes, absolutely, but it's one that's easier to teach juniors.


Yes, exactly, and it is such a good rule to go by so much of the time. But like many rules, you need judgment to know when to apply it.

Rules are a poor substitute for actual thought.

> Rules are a poor substitute for actual thought.

This should be the guiding principle of life!

The guiding principle... a rule, so to speak!

Everything in moderation, even moderation itself.

> Everything in moderation, even moderation itself.

This was my guiding principle in life, and I thought I was pretty clever for coming up with it, but later found out that someone far more clever got a lock on the phrase in history: https://www.goodreads.com/quotes/22688-everything-in-moderat...

No, no, no; it's "Everything in moderation, especially moderation.". :)

Really? I'm sure you know that all generalities are false.

Only a Sith speaks in absolutes

You nailed it. I will print this and hang it on the wall in our office.

I thought rules were a way of organizing thought, like a language which is governed by rules.

The rule of 3 usually is in reference to small scoped abstractions, not whole modules or subsystems. We're talking about extracting a short function, not significant and potentially thorny chunks of code.

I would say it’s for entire features sometimes. For instance, we are a B2B company. We have features on the roadmap or a feature might be suggested by a client. Either way, you hit the jackpot if you can get one client to pay for a feature that doesn’t exist in your product that you can then sell to other clients.

The problem is that you don’t know whether the feature is generally useful to the market. But you think it might be, in that case, you build the feature in a way that is useful to the paying client and try not to do obvious things that are client specific. But until you have other clients you don’t know.

That's been my experience doing embedded stuff. Customers will tell you they want a feature. You'll implement it then find out they don't need it enough to devote resources to utilize it on their end. So then it just lingers in the code base. And never gets real world testing. Lately I've been pulling unused features and consigning the code to the land of misfit toys. Just so I don't have to maintain them.

If that’s the case and they were willing to fully pay for the feature, at professional services + markup hours, it wasn’t a loss. In our case, we still got unofficially professional services and recurring subscription revenue and now we have a “referenceable client”.

If it’s behind a per client feature flag, it doesn’t cause any harm.

Any of these rules comes with "use your best judgement" and not drive off a cliff blindly following it.

> I asked the developer and they told me, "I have this philosophy that says when you have only two of something, it's better to just copy and paste the code, but of course when you get to the third one you want to start to think about refactoring and combining them."

I would argue this is a good rule of thumb, but nothing should ever be a hard unbreakable rule. The usual refactoring rules should apply too: if 95% of the code is the same, you should probably go back and refactor even without a third use case because it sounds like the two use cases are really just one use case with slight variation.

If the compiler didn’t catch it, doesn’t that say it was modeled incorrectly?

Why not have an IAirSpace interface or an abstract AirSpace class with two specializations? If there were processes that could handle either it should take an AirSpace class, one that could only handle one or the other took the specialization.

If the steps were the same for handling both, have step1...step(n) defined in the concrete class and have a coordinating “service” that just calls the steps and takes in an IAirSpace.

To be honest I don't get it. What language was this in, that you couldn't duck type the data or make the data structures inherit from some shared interface that encapsulated the commonalities?

That is a reasonable question. You may have noted that I left some details vague, so all I ask is that you trust me that the overall story is true.

Of course if you don't trust me, that's cool too! ;-)

Hah! Totally do, I'm just nerd sniped by the details I guess :-)

Rules of thumb are just that. Developers still have to understand the underlying motivation (incidental vs inherent duplication).

> The Rule of 3 is a great rule, except when it isn't.

"Rules are for the guidance of wise men and the obedience of fools."

It's unfortunate that you had to deal with a fool, but that's not a indictment of the particular rule that they picked to follow off a proverbial cliff.

Edit: fixed ambiguous quote formatting.

Thanks for your reply. I just want to note that what appears to be a quote in your comment ("Rules are for the guidance of wise men and the obedience of fools") is not something I wrote at all. If you still have time to edit the comment, I might suggest changing that so it doesn't look like you quoted me.

> But because of blind obedience to the Rule of 3, there were many thousands of lines of code duplicated, both in the importer and in its downstream clients.

Well sure, blind obedience to anything at all can cause problems.

There’s a safety issue here. I can conceive of these code blocks diverging more with time. Plus is there really a cost of duplication.

but it was used in many places, so the rule of three applies.

Personally, I would have kept the code that he wrote and made something that handled the special cases you are talking about.

Everything should be replaceable, not extendable. Now if the special cases change, my code can be thrown away without changing the data feed code.

Are you sure merging code for different datafeeds would be better though? In such cases, what is identical and what is not, should be references to eachother in comments. But you don't know beforehand which approach would be better, unless you know the datafeeds will stay the same as now.

The sad story here is that if you know the datafeeds will stay pretty static, there's little to gain making an advanced abstraction over them. Which is why you often find duplicated code that haven't been touched for years.. The original target was met with a naive approach, and no new changes lead to stale codebases.

If you have a 95% match on something nontrivial (and it likely won't diverge significantly), I'd go for merging even with 2 cases. At least merge most of the common parts.

Reading a couple of ifs, and some not-quite duplicate procedures seems much better than having a complete 2-set in cross-refenenced files.

Why are you reading a couple of ifs instead of having the two similar things represented by separate classes with shared functionality in a common class? Or even if you prefer composition to inheritance you could still make it work cleaner without a bunch of if statements.

With a 95% match, you really only have one use case, with some minor variation.

No, you also need to ask not just whether they're similar, but whether they are the way they are for the same reason, and are thus likely to change together. It doesn't matter if there are 10, if they all might change independently in arbitrary ways.

"DRY" as coined in The Pragmatic Programmer spoke in terms of pieces of knowledge.

If you are combining code just because it looks similar, you're "Huffman coding."

Every time I have seen a feature that was written general to handle possible future uses, after a year of sitting unused there will certainly be modifications or surrounding code that doesn't support the planned extensibility. So it can never be used without a lot of work changing things anyway.

Future coding has lead to some of the most overcomplicated systems I've worked with. It's one of the reasons (among many) I quit my last job. I was constantly told code that had no use cases was "important to have" because "we needed it".

So does that same idea apply to all of the many abstractions thst geeks do just to stay vendor or cloud agnostic just in case one day AWS/Azure go out of business?

On the other hand, I worked on a multi-million line codebase that was deeply joined to oracle’s db, with a team who all really wanted to move away from it but couldn’t because in the beginning (a decade earlier) the choice had been made to not put in “unnecessary” abstractions.

It’s not about the abstractions. In the case of Oracle or any other database, if you’re only using standard SQL and not taking advantage of any Oracle specific features, why are you spending six figures a year using it?

The same can be said about your cloud provider. If you’re just using it for a bunch of VMs and not taking advantage of any of the “proprietary features” what’s the purpose? You’re spending more money than just using a colo on resources and you’re not saving any money on reducing staff or moving faster.

You’re always locked into your infrastructure decisions once you are at any scale. In the case of AWS for instance (only because that’s what I’m familiar with), even if you just used it to host VMs, you still have your network infrastructure (subnets, security groups, nails), user permissions, your hybrid network setup (site to site, client to site VPNs) your data etc.

In either case, it’s going to be a months long project triggering project management, migrations, regression tests, and still you have risks of regressions.

All of the abstractions and “repository patterns” are not going to make your transition effort seamless. Not to mention your company has spent over a decade building competencies in the peculiarities of Oracle that would be different than MySql.

After a decade, no one used a single stored procedure or trigger that would be Oracle specific? Dependencies on your infrastructure always creep in.

Yes. There's no point abstracting over a vendor API if you're not actually using an alternative implementation (even for testing). Otherwise, keep your code simple, and pull out an abstraction if and when you actually have a use case for doing so.

We had platform agnostic discussions with no concrete plan or action by anyone to actually escape our platform.

Vendor agnostic code doesn't anticipate AWS going out of business, just them raising prices significantly. It can be smart to be able to switch in a reasonable amount of time so that you can evaluate the market every few years. This way spending extra time to be vendor agnostic can also pay off. But there's no technical reason for that, it's a cost issue.

It’s often noted that in almost 15 years of existence, AWS has never increased prices on any service.

What are the chances that AWS will increase prices enough to make all of the cost in developer time and complexity in “abstracting your code” and the cost in project management, development, regression tests, risks, etc make it worthwhile to migrate?

The cost of one fully allocated developer+ qa+ project manager + the time taken by your network team + your auditors, etc and you’re already at $1 million.

Do you also make sure that you can migrate from all of the other dozen or so dependencies that any large company has - O365? Exchange? Your HR/Payroll/Time tracking system (Workday)? Windows? Sql Server? SalesForce? Your enterprise project management system? Your travel reimbursement system (Concur), your messaging system? Your IDP (Active Directory/Okta)?

Old boss of mine had a great line: "two points always make a line, but three usually makes a triangle"

Your boss made a good point. My boss also made the exact same good point. But between those two same good points, there isn't a great line. ;)

We had introspective bosses?

I think the catch all term for that is YAGNI.

YAGNI is about not adding functionality until it's needed. DRYing code isn't adding functionality.

I think GP is implying the thing you're not going to need is a generic solution to the repetition. i.e: DRY is in fact a feature.

DRYing existing code to handle exactly what is needed to support the current uses with no additional axes of variation isn't adding functionality (well, if it handles more than the existing values in the existing axes of variation, it's adding some but arguably de minimis functionality.)

Building code with support for currently-unused variation to support expected future similar-but-not-identical needs, that is, pre-emptively DRYing future code, is adding functionality.

WET - Write Everything Twice

What is YAGNI?

You Ain't Gonna Need It

Write once, copy twice, refactor after 3.

On one project we ended up with a series of feature that fell into groups of threes and we kept trying to make the 2nd feature in the series generic, and time after time #3 was a rewrite and significant rework of #2. So any extra time spent on #2 was time wasted.

I think that what Burlesona is suggesting is more nuanced (and effective) than the rule of three. We can easily imagine situations where code used twice warrants a refactor, and situations where code used three times does not.

The rule of three suffers the same problem as the pre-emptive refactor - it is totally context insensitive. The spirit of the rule is good, but the arbitrary threshold is not.

Similarly, 99% of your comment is bang-on! My only gripe is the numeral 3. But pithy rules tend to become dogma - particularly with junior engineers - so it's best to explain your philosophy of knowing when to abstract in a more in-depth way.

> My only gripe is the numeral 3. But pithy rules tend to become dogma

Agree, 3 is a pretty arbitrary number. If you have a function that needs to work on an array that you know will certainly be of size 2, it takes minimal effort and will probably be worthwhile to make sure it works on lengths greater than 2.

But the bigger point is valid: you need examples to know what to expect, and if you make an abstraction early without valid examples, you'll almost certainly fail to consider something, and also likely consider a number of things that will never occur.

> make it “generic enough to handle future use cases”.

The answer to this is usually YAGNI.

That is, don’t plan for a future you might never have. Code in a way that won’t back you into a corner, but you don’t know what the future’s cases might be (or if there even will be any) so you can’t possibly design in a generic way to handle them. Often you just end up with over-engineered generec-ness that doesn’t actually handle the future cases when they crop up. Better to wait until they do come up to refactor or redesign.

Some people argue to design in a way that lets you rewrite and replace parts of the system easily instead.

The repetition is what is YAGNI!

Repeating code 7 times in preparation for separate evolution of those 7 cases is YAGNI, unless the requirements are on the table now.

Merging repeated code into one is something that is demonstrably needed now, not later.

It's not so much that you're preparing for 7 separate cases, as a thing that works in one place and almost, but not quite, fits in 6 others. You rarely exactly duplicate code, but often duplicate and modify.

By the time you hit 7, you do clearly Need It. But now you've got 7 cases to work from in writing the generalization. When the number is 2, it's often reasonable to say, "I don't know how these will evolve and I'll probably guess wrong".

Yes, I agree. That’s not what I was replying to, though. I noted in another comment that I consider merging worthwhile even in the absence of three use cases, certainly if what you have now is very similar.

If all seven are the same except for one or two cases, it doesn’t mean that you have to have a bunch of if statements, you either use inheritance or composition to create special cases and judiciously apply “pull members up” and “push members down” refactoring, interfaces, abstract classes, virtual methods, etc. These are all solved problems.

Yes I know about the whole “a square is not a rectangle problem”.

"Premature optimization something something..." ;)

That's actually "premature generalization".

Isn't premature optimization a generalization of premature generalization? ;-)

Oliver Steele describes "Instance First Development", which the language he designed, OpenLaszlo, supported through the "Instance Substitution Principle". I've written about it here before, and here are some links and excerpts.


In the right context, prototypes can enable Instance-First Development, which is a very powerful technique that allows you to quickly and iteratively develop working code, while delaying and avoiding abstraction until it's actually needed, when the abstraction requirements are better understood and informed from experience with working code.

That approach results in fewer unnecessary and more useful abstractions, because they follow the contours and requirements of the actual working code, instead of trying to predict and dictate and over-engineer it before it even works.

Instance-First Development works well for user interface programming, because so many buttons and widgets and control panels are one-off specialized objects, each with their own small snippets of special purpose code, methods, constraints, bindings and event handlers, so it's not necessary to make separate (and myriad) trivial classes for each one.

Oliver Steele describes Instance-First Development as supported by OpenLaszlo here:

Instance-First Development




[...] The mantle of constraint based programming (but not Instance First Development) has been recently taken up by "Reactive Programming" craze (which is great, but would be better with a more homoiconic language that supported Instance First Development and the Instance Substitution Principle, which are different but complementary features with a lot of synergy). The term "Reactive Programming" describes a popular old idea: what spreadsheets had been doing for decades. [...]


Oliver Steele (one of the architects of OpenLaszlo, and a great Lisp programmer) describes how OpenLaszlo supports "instance first development" and "rethinking MVC":



[...] I've used OpenLaszlo a lot, and I will testify that the "instance first" technique that Oliver describes is great fun, works very well, and it's perfect for the kind of exploratory / productizing programming I like to do. (Like tacking against the wind, first exploring by creating instances, then refactoring into reusable building block classes, then exploring further with those...)

OpenLaszlo's declarative syntax, prototype based object system, xml data binding and constraints support that directly and make it easy.

OpenLaszlo's declarative syntax and compiler directly support instance first development (with a prototype based object system) and constraints (built on top of events and delegates -- the compiler parses the constraint expressions and automatically wires up dependences), in a way that is hard to express elegantly in less dynamic, reflective languages. (Of course it was straightforward for Garnet to do with Common Lisp macros!)


Instance-First Development:


>The equivalence between the two programs above supports a development strategy I call instance-first development. In instance-first development, one implements functionality for a single instance, and then refactors the instance into a class that supports multiple instances.

>[...] In defining the semantics of LZX class definitions, I found the following principle useful:

>Instance substitution principal: An instance of a class can be replaced by the definition of the instance, without changing the program semantics.

In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class.

That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).

The syntax for creating a bunch of objects is parallel to the syntax of declaring a class that creates the same objects.

So you can start by just creating a bunch of stuff in "instance space", then later on as you see the need, easily and incrementally convert only the parts of it you want to reuse and abstract into classes.

What is OpenLaszlo, and what's it good for?


Constraints and Prototypes in Garnet and Laszlo:


In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class. That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).

This gives me nightmares of over engineered xml programming that is infamous I the Java community. You lose all of the benefits of static type checking.

Generic enough code that can be adapted to future cases is code with clean architecture that follows standard OO principles.

On the other hand, trying to handle all the hypothetical cases because "that makes the code generic and future-proof" is usually a complete waste of time.

My view is to develop the simplest, well architected OO code that can handle the use cases at hand.

After witnessing the collateral damage of many software reuse projects (anyone remember "components"?), I came up with a different ruleset, useful for "compromising" with "software architects":

First, translate the data.

Second, divine a common format and share the data.

Third, create the libraries for this common format, to be reused amongst projects.

I have never reached #3 in my professional career. Sure, we wrote the libraries. But other teams, projects have never adopted before whole effort became moot.

So I kept my projects in tact and moving forward, while letting mgmt think they're doing something useful.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact