Hacker News new | past | comments | ask | show | jobs | submit login
Confessions of an Abstraction Hater (250bpm.com)
121 points by rumcajz 89 days ago | hide | past | web | favorite | 131 comments

I personally wouldn't even call the author's given example abstraction, it's just splitting code into functions.

Splitting code into functions is mostly an aesthetic and "reading comprehension" choice so you get shorter, well-named functions -- as well as not too many variables to keep track of in a single scope.

Abstraction, in contrast, takes two similar but different functions (or other structures) and turns them into a single one that uses some kind of parameter instead.

And the easiest rule I've ever found is to abstract whenever you've duplicated 3 lines of nearly-identical code, and also whenever you've written a single line of nearly-identical code 3 times.

If it hasn't been 3 times or 3 lines, it's not worth it.

And never abstract in advance (the author's "You can imagine a case where somewhene would like to call it from elsewhere") unless you know it will need to be called.

(Side note: also be extra-careful about abstracting stuff in business logic -- there's extra value in having code map 1:1 to specs or processes even if it winds up being redundant, for ease of maintainability later.)

I've found that it's much more difficult to unwind an abstraction than create one. And having similar, even identical, code in multiple places isn't necessarily a good reason to build an abstraction. The question to ask is not, are these similar, but will they act and change in the same ways (to your best knowledge) both now and in the future?

I agree with this comment wholeheartedly. Too often DRY is blindly applied to every little piece of code... in ways that violate SRP and lead to a tightly coupled mess of code. It seems that DRY is the most easily understood of the SOLID concepts, and is therefore, unfortunately, over applied by many developers without an understanding the downstream affects.

Nitpick: The original DRY principle in The Pragmatic Programmer was about not repeating requirements. If two pieces of code are very similar, but have to do with different requirements (e.g. business logic), then DRY suggests you not abstract it any further. DRY, as a principle, is very good, and I've not seen a case of proper use being problematic.

As the parent said:

>The question to ask is not, are these similar, but will they act and change in the same ways (to your best knowledge) both now and in the future?

If they represent different requirements, then the answer to the question is usually "No." So do not couple them together with an abstraction. If you do, you are not invoking DRY. You're just creating spaghetti.

Nitpick: DRY isn't directly a SOLID concept.

I find SPOT (Single Point Of Truth) to be more to the point than DRY, because it is explicitly about semantics and not about mere syntactic repetition.

How have I never seen SPOT. I've read about the idea many times over the last ten years, but weirdly enough I've never seen it put it into an acronym. I feel like rules of thumb in software never quite make it until they're an acronym.

I mean honestly the SOLID advice is pretty terrible, but is so often repeated because it has such a great acronym.

What is suboptimal about SOLID? I’m about to buy into it so I’d like to know.

Bout to see a movie with my wife so here's just a short description for now.

SRP. Advice everyone can agree on, but not very actionable. I've seen a dev make a change to conform to SRP while another dev argued the change was bad because it violated SRP.

Open/closed principle if you look at it's origin this is just terrible advice. So everyone tortures this advice until it says something different

Liskov substitution principle - sure, but as a top 5 most important idea in oop

Interface segregation principle - sure

Dependencies inversion principle - I don't think abstractions are always better than concrete representations. In fact a lot of times they're worse. Humans are trouble at learning abstractions. So bad in fact they can learn abstractions from the concrete so much easier than from abstractions. And this is a huge cost to readability, and it's harder to get correct.

The problem with these acronymed principles is that there are always programmers who will treat them as religion (and senior programmers who will enforce it as such).

> explicitly about semantics and not about mere syntactic repetition

I'm hoping I remember to steal this later!

You may be interested in Sandi Metz's article "The Wrong Abstraction."


Love this post from Sandi, it nicely articulates the problems with naive DRY interpretation/application. The comment section on the page exhibits the stubbornness (although well intentioned) of those who perpetuate this anti-pattern.

I agree that about not doing premature abstraction, and thinking about future evolution rather than just current similarity when deciding whether to abstract or not. However, I don’t think it is difficult to unwind an abstraction. You just go to all the usages of the abstracted entity and replace them with specific code. Doing the opposite on the other hand is much harder if you have multiple sites that do something similar but not in somewhat different ways because they were written separately - it is much harder to merge the usages into a single abstraction, for the same reason it is hard to switch between libraries that do the same thing because the different libraries have different APIs.

Why is it hard to merge them? If they're doing the exact same thing, replacing with an abstracted function should be straightforward. If they're doing different things, merging them may be unwise, since they may truly be different conceptually.

On the other hand, I've definitely seen it become near impossible to unwind an abstraction once it starts getting used by other abstractions that are used in many other places in your code (multiply by as many levels or permutations as it takes to achieve desired level of complexity). Meanwhile, this poor function has been twisted to satisfy all these unreasonable demands from every corner of your codebase, and teasing it apart becomes an Herculean effort.

Agree. The same principle that applies to code, applies to design: should be optimized for reading, not writing. I currently know some developers who have trouble switching away from IntelliJ IDEA because without it they can't navigate the entire abstraction hierarchy they've created.

DDD get's this right by focusing on the language of talking about the design. If you can't make a single description that succinctly and accurately formulates the purpose and mechanism of the abstraction, in a way that fits all cases. I generally find the abstraction to be a poor fit.

I've found DDD to be a great concept to apply to this problem. Define your domain based on business foundations that will last, and intelligent reusable code abstractions practically write themselves.

> it's just splitting code into functions

I feel you underestimate the concept of "function". We all tend to do that because they are abstractions so ubiquitious that we cannot even imagine world without them. But it was not always so. Here's the first time someone speculates about "subroutines" (1953): https://web.archive.org/web/20150628022047/http://www.laputa...

>I personally wouldn't even call the author's given example abstraction, it's just splitting code into functions. Splitting code into functions is mostly an aesthetic and "reading comprehension" choice

Actually "splitting code into functions" was one of the first major abstractions invented for structured programming.

Not to mention that functional programming (and lambda calculus) are all about functions...

> And never abstract in advance

I would contest that, when the abstraction you're going for is actually a library. If I want to send SMS then pulling in the Twilio SDK is a useful and effective abstraction compared to rolling my own API client.

Similarly, MVC is a useful abstraction too. I'd prefer to kick start a Rails project in a few minutes as opposed to piecing together a full web stack, and I'd take a desktop framework over building all of that from scratch.

There does seem to be a great deal of confusion about what abstraction actually means. I've seen 4000-line Python modules full of indirection up and down the inheritance chain described as "nicely abstracted" rather than "unmodifiable spaghetti." The reasoning was fuzzy: "you instantiate it and it just does everything you need." But you can't build anything new with it, and it makes so many assumptions about its environment that you can't change anything around it. However you define "abstraction", I'd call that a "boat anchor" instead.

Another good rule of thumb: if you can't find a good name for the thing you want to abstract, it might be a bad idea.

To me it's all about putting related stuff together. LoC doesn't matter much, to me either way. Here's some more of my thoughts on the matter: https://nixpulvis.com/ramblings/2019-01-05-splitting-cats-an...

Good naming, often a causality in older codebases, helps this a lot.

Abstraction hurts when things are abstracted and named poorly but helps a massive amount with readability when done well.

If I’m looking at a (software) method to cook a meal show me the high level salient features of that process in some centralized place. Don’t show me the angle of the knife blade or the exact movements used to chop the tomatoes. It should be (simplified in many ways):





Sadly the problem is this basic level of software abstraction is preserved in much enterprise software but ChopVegetables upon closer inspection - and after drilling down into other methods - will also purchase missing ingredients and if you don’t have the money to buy them will also sell your items including the already preheated oven you need to cook the food.

Quite serendipitously: https://twitter.com/h_liyan/status/972475860660842496

"Programming is like cooking: in Python, you use pre-made bolognese sauce; in C++, you start from fresh tomatoes and minced meat; in Assembly, you have a farm where you grow your tomatoes and raise your cow." - @gv_barroso

"In #Java, first you think about Cuisine, then ItalianCuisine, then you think of Bolognese as a specific case of Sauces, which is in turn a more specific case of all things with an EdibleInterface, and whether Tomato is a Fruit or Vegetable. In the end, you die of hunger."

While I like the humour a lot...

I don't think Java is what is was about 8-15 years ago. I did a little at university in the late 90s, and have started working with it professionally just in the last three years. AFAICT the enterprise OO nightmare is over, and it's much more like working with python - pick some components and libraries that implement the details of what you need then build your business logic. With lambdas, streams, futures etc, it becomes quite straightforward and powerful.

Whilst Java the language has progressed, Enterprise OO is sadly far from dead :(

Modern Java has a lot of features borrowed from functional programming, but using them is still pretty awkward and verbose.

Kotlin is vastly better in this regard IMHO.

That depends a lot on where you end up. We're still doing Enterprise OO, and it's still very much alive.

I love when developers make food analogies, they almost always show their ignorance of cooking, and sometimes of software development. After 30+ years of software development, I always read the actual code to find out how software works. "CombineIngredients" is useless and honestly quite horrific. Instead of "simplifying" it is obfuscating. The details of cooking are in the function. I don't need to see "PreheatOven" at the same level as "CombineIngredients". One of my favorite posts on this subject is: http://number-none.com/blow/john_carmack_on_inlined_code.htm...

Tucking away code in a function serves a purpose, and that purpose is not to make the highest level function hold no detail. Like cooking, software is about the details, so don't hide them from me. I can't stand the "Russian Doll" approach of a function that just calls another function that just calls another function that just ... Nothing improves the understandability of 10 lines of code than spreading it across a 10 functions often in different files /s

Great abstractions exist, but very few abstractions are great.

In my experience Unreal Engine 4 suffers from this. It's a fantastic game engine and in many ways its code is really great, but on many of the occasions when I've had to trace call stacks in its source, it's a mess of long function call chains marred with branching conditions and extremely similar sounding function names.

I think it's a good example of how code can look clean but actually be a bit of a mess to work with. In contrast at my job I've been making heavy use of structs and functions scoped solely to cpp files. The result has been in my experience somewhat messy-looking code which actually has very low cognitive overhead and is very easy to change.

Sometimes long functions are a viable answer. It all depends on whether it actually makes sense to tear things out, otherwise that function called in only one place you pulled out is just polluting the scope. There are other ways to handle it: local function, placing the chunk of code in its own scope limiter, and possibly comments documenting and breaking down the steps.

Yes. For many codebases, the primary purpose of the source code is being read by humans with the goal of understanding its actual behavior. When I'm reading through a codebase obfuscated by single-use helper functions, the first thing I do is inline them all. Bugs often result from the software doing something different from what its author intended, so the human-authored function name is not a substitute for reading its implementation.

I've noticed that in order to solve "what abstraction should I use here" it is often enough to solve "what should the name of the abstraction be".

I suppose this is because a name essentially sets up an analogy and if you've picked the correct analogy you can freely exchange everything you already know between the analogized domain and your problem domain.

That's the cost of abstraction. Abstraction done well is great. But "done well" means specifying the behaviour, all the quirks and corner cases, documenting it and keeping the docs in sync with the code. That's prohibitively expensive. It's only done, to some extent, for system libraries.

So we end up with a lot of badly-executed leaky abstractions.

It's a cost problem and with the current economics of software development I see nothing we can do to fix it.

But at least we can avoid unnecessary abstraction: Is this function called only from a single place? Yes? Then get rid of it.

But at least we can avoid unnecessary abstraction: Is this function called only from a single place? Yes? Then get rid of it.

To me it makes perfect sense to have a function that’s only called from a single place. If the function represents a coherent abstraction (ie. decodeDateStamp, not continueCreatingUser2), then it makes the code easier to read.

If it's only called from one place, why not inline it and write a comment? Even just inlined with only the function name you were going to create, as a comment, would be easier to read than a function call to another file. How is that worth the overhead when the function only has one call site?

Because bugs per line increase when functions get longer than one screen fold.

As well as reducing the conceptual length of your function, bundling up part of it in another function called only once makes a statement, possibly even some guarantees about which variables are and aren't modified by the sub function.

It may also decrease the number of variables used in your function and hence the risk of misusing one.

Taking this logic to the extreme, several languages let you define functions within other functions. Since everything is ultimately just used by main, why not define everything in main?

Well, those local functions usually capture context too, so every local variable of main is now effectively global - with all the difficulties of reasoning about widely shared global state that brings. On large codebases you now have a main function that's millions of lines of code - a bit hard to reason about.

In a less extreme example, I more commonly I see functions in the several thousand lines of code range. John Carmack has actually argued for this "style C" at least in some cases: http://number-none.com/blow/john_carmack_on_inlined_code.htm... . When it's a simple linear flow like those examples, I'm even relatively OK with that style!

But IME clean examples like that are the exception, not the rule. Most functions with thousands of LOC have very high cyclomatic complexity scores too, and are hard to reason about. I have significant trouble keeping track of invariants and edge cases in such code. Ultimately, it can be done, but you have to figure out how to digest such code into bite sized chunks of functionality and what they operate on.

Functions declare in code those chunks, imposing some limits on how they interact with other chunks - only interacting through global, member, or input variables. You're forced to give them names, which are much better maintained than the section comments.

OTOH - 1 line functions used in one place? I'm quite willing to inline those. And larger functions. There's a sweet spot to cater to - something about how much code you can reasonably keep in your head and reason about all at one time.

If it's only called in one place, it should be easier to optimize as well, because the context of the call is clear.

> often a causality in older codebases

Did you mean casualty?

Yes I did. This was typed out on a mobile phone and missed that one. Now I can't edit it.

I have a strong suspicion that heavy handed enforcement of DRY (Don't Repeat Yourself) has lead to large codebases containing far too many abstractions in a foolish attempt to literally never repeat yourself.

DRY can be a useful guideline but please don't enforce absolute, strict adherence. Sometimes a little duplication is okay and helps avoid complex, inflexible abstractions.

Also quoted as: "Duplication is better than the wrong abstraction"

I'm a huge proponent of this concept. The goal is to be able to build and iterate as fast as possible. Most of this is achieved when new folks can step in and understand the code right away. Needless abstraction slows this down and makes the app more brittle to changes in response to customer/client inputs.

Sandi Metz had an awesome presentation about this during a Rails Conf.

"Duplication is far cheaper than wrong abstraction"


Her blog talking more about the assertion:


This is one of the idioms of Go I liked. "It's OK to repeat the same 3 lines of code."

My thought process has always been: If there is a common piece of functionality to be shared, then refactor. If it just happens to be the same, maybe leave it.

> It's OK to repeat the same 3 lines of code

… 2–3 times. Not ok to repeat them 1000 times.

No, Go repeats the same 3 lines of code millions of times that you should never really have to do. There is no reason you should have to constantly throw in err != nil boilerplate when other languages have Optionals that make this automatic (and typechecked too). Likewise with for loops that could be done much more simply with proper maps and reduces.

OTOH, when you're repeating the same 3 lines of code over and over, refactor so it can fit in one line of code. This is one of my big peeves with Go; in C at least I can make a RETURN_ERR macro.

I agree, but I also think that you can harmonize DRY and "Don't abstract everything" by observing that repetition is a higher bar than it may first appear.

A common DRY failure is when you abstract one function, then you find another place you can put it if you just add one more parameter, then you find that if you add two more parameters it fits here and here, and repeat until you've got a 12-parameter monstrosity loaded down with if statements and conditionals and block of code whose sole purpose is to preprocess the optional arguments into something the rest of the code will take... but what got you there still isn't the DRY principle, but that the repetition was actually never there in the first place. In the worst case you can end up with two or more things that should actually be entirely separated functions.

I've personally done that once myself; tore one of these functions up into two functions literally by copying and pasting the lines of the original into the two new functions... and there was virtually no line that went into both the new ones. Post simplification the sum of the two new functions was about half the length of the original. And it wasn't anybody's "fault"... it has just grown that way, one commit at a time, over dozens of commits.

DRY is still really important, but there's a lot of complexity/wisdom in what the true meaning of "repeat" is.

My rule of thumb is always "which version is easier to understand and verify for a code reviewer?"

Sometimes doing DRY means you've nicely separated out and encapsulated the common code that can be checked just once in isolation. Sometimes doing DRY means you end up with a complex abstraction that is hard to understand on its own.

I'm not sure what this has to do with abstractions. What I call abstraction is a cognitive pattern that allows me to model a specific entity with a semantic that foregoes any implementation specific details.

Example of abstraction: file descriptors. I can open()/read()/write()/close() [to/from] a file descriptor and I can model and understand the semantics of it without knowing what's "behind" the file descriptor. Is it a pipe? Is it a network socket? Is it a file on the drive? a partition? a whole drive? a logical volume across many drives? perhaps encrypted? I don't care! That's what the abstraction brings to me: not needing to care.

Not needing to care reduces the cognitive load on me, which means it reduces complexity. Since our propensity to make mistakes increases very quickly with complexity, keeping it low is key to success.

What the article describes instead is breaking down code into individual functions, and the common trap associated with imaginary requirements ("You can imagine a case where someone would like to call it from elsewhere").

> “Not needing to care reduces the cognitive load on me, which means it reduces complexity. Since our propensity to make mistakes increases very quickly with complexity, keeping it low is key to success.”

I actually have come to disagree strongly with this beyond a very, very tiny amount of basic attribute encapsulation, over my career.

Inevitably, 99% of your time becomes dealing with the headache of how the abstraction hides the underlying details at the next reductive step down the chain. You quickly realize that the “win” you get from being able to express a program more concisely in terms of a layer of intermediate abstractions is pretty meaningless, because the number one thing you and all future code readers will need is immediate visibility into how properties of the implementation result in certain resource usage, running time, etc. The number one day to day activity will be performing surgery at some layer below the top line abstraction, to such a degree that the abstraction is nothing more than a nuisance.

This is also one reason why a functional style tends to work so well. You separate data encapsulation (in dumb no-method, no-inheritance record types) from instruction encapsulation (just functions), and the abstraction does not have much effect on visibility of the primitives it relies on from the next lower reductive layer of components.

The author’s views on abstractions and developers in general seems absolutist and overly simplistic to me.

Everything in moderation is good rule.

In general I find a lack of abstractions a sign of lacking maturity, but so do I for overuse of abstractions (“defensively abstracting”).

Getting the balance right takes experience, and you gain that experience by making both of the above mistakes, and typically in that order.

The problem with abstractions is that you don’t really notice good or working abstractions. You only notice the bad ones.

Thus the false developer-meme that abstractions themselves are a bad thing.

I agree, but in my own experience at a dozen different shops, insufficient abstraction is a mistake that a first-year developer makes, and excessive abstraction is a mistake made by nearly every developer with more than a year of experience. Thus, the team leaders and senior developers can find and catch many of the errors due to insufficient abstraction, but the errors of excessive abstraction tend to accumulate.

OTOH, its equally important to stress on this section of the post- "Not that we can do without abstraction. We need it to be able to write code at all"

In the last 4-5 years I encountered a lot of such "hate" against abstraction and without knowing I _STOPPED_ writing abstractions. None at all. I feared writing abstractions because people advertised them as evil. The years have gone by, but the codebase needs to be maintained. There are often days when I have to spend shitload of time doing something simple just because I never created abstractions where I should have.

So yeah, my fellow programmers, please do not scare off young programmers from using abstractions. They might write the wrong abstractions sometimes, but hey, how else do you learn if not from mistakes?

We had a ton of endpoints without any form of abstractions. Need to do something on all endpoints? Have fun copy/pasting into 80 source files.

We don't need it to write code, and maybe not even to be efficient at it. Carmack wrote an article about the time he worked with Saabs aerospace people. They write flight controls track databases sensor contol targeting etc. with no calls or backward branches except the end of the main loop. So every matrix multiply is duplicated in place. That is very complicated code. Carmack was inspired by that and used a less strict form of that ruleset for his next project. He allowed for functions as long as they were pure but nothing else.

I half agree with this. The problem is: what's the alternative? The reason we have abstraction is because not abstracting also has a cost. Consider encountering a legacy codebase with no abstractions. Yes, perhaps each unit of code is more digestible, but there's so many more units. The codebase is 10x as large, because there's no abstraction. Is it really easier to comprehend that way?

As in everything else, it boils down to an issue of balance.

I have an overriding design principle that helps find where that balance should be: "Attempt, within reason, to minimize the number of places a reader must look in order to understand the code on his or her screen. Minimizing the number of changes required to alter functionality is a secondary but lesser goal. Minimizing the number of characters the programmer must type to implement functionality is explicitly a non-goal."

For large C++ codebases at work, this leads to coding standards like:

- Don't introduce typedefs referencing other typedefs in public API, especially when the typedefs and the API that uses them are in different files. Having to look in more than half a dozen different files just to understand the signature of a function is actively reader-hostile.

We have a ton of conventions similar to this that help to make the code significantly more pleasant to read. It turns out that we still have quite a few useful abstractions, but they tend to be fairly large compared to most codebases I've worked with previously.

I think refactoring large blocks of code into functions named after their purposes is good abstraction. But introducing whole new constructs (Proxy, Adapter, Bridge, Factory, Repository) should only be done when it makes the code simpler than the function-only alternative.

Some time ago, I would have agreed with you about Adapters, Factories and so on. But now, I'm not so sure.

My team inherited a code base that wasn't really legacy - maybe two years old - with a very simple business logic: Collect data from one REST service, save it in a "cache" database and answer to requests from another one. Seems completely simple, and the code wasn't awful, either. However, there was no abstraction, no layers, nothing: The REST controllers knew exactly what the DB looked like and contained a large part of the collection logic. There was a client for the collection, but it was very low level and exposed everything. Additional domain logic was smeared over five to ten levels of hierarchy.

In the end, each apparantely simple feature we had to add took days instead of hours because we had to touch so much code. The principle of locality was completely absent. In the end, we rewrote large parts of it so that you don't have to change the REST controller anymore if the DB schema changes.

My personal lesson was that you should always have some abstraction, even if the service's task is so simple. Things will change, and you must be prepared for that. If every part of your code knows all the other parts, you're doing it wrong. Have clear interfaces and boundaries (and models) so that changes can be localized. Of course, that doesn't mean that you should implement something like: https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris... .

I once inherited a project implemented in Spring. It meant that while we wanted to interview for bright people who would be able to understand the business domain, we actually had to interview for people who knew Spring, and might be able to pick up the domain knowledge. Otherwise the overhead of bringing people up to speed on Spring was too much.

Usually abstraction adds LOC

If abstraction adds LOC on net to your project, you're doing it wrong.

In the very example in the OP, the abstraction adds lines of code: function calls, function signatures, possibly return statements. I think many programmers associate abstraction with having less code (which is almost always good, under reasonable circumstances of course), and this translations into a love of abstraction even in cases where it doesn't simplify things or lead to less code.

Abstractions should reduce complexity, which often correlates to LOC, but is not necessarily so.

Ya, but that's because the abstraction in the OP isn't complete. It's showing the abstraction of a single instance of a subproblem, for clarity. But real abstraction generally abstracts multiple instances of a subproblem (or at least, it should), and thereby reduces LOC.

I find that Casey Muratori's "Compression Oriented Programming" helps strike a perfect balance between too much abstraction and too little of it:


It would seem useful to retain a copy of the original code in the "simpleton" format.

This is because it is easier to "refactor" simple into complex than to refactor complex into simple.

In the simple format the work is accessible to more people.

In the more complex formats, it becomes less accessible.

Perhaps it is added cleverness, abstraction and complexity that creates a sort of "irreversability" (cannot return to simple) that motivates people to "reinvent the wheel", ignore past work and totally rewrite things from scratch.

What they really want when they do that (methinks) is a "simpler" format to work from, not "simpleton" but not too far from it. Goldilocks levels of cleverness and abstraction.

Similarly, perhaps that unmanageable, irreversible complexity is also what compels people to add more abstraction layers in attempts to "simplify".

In other words, at some point the level of complexity becomes too great for someone to unravel. All they can do is add to it.

I think you're on to something. I compress this to “quantity is easier to manage than complexity”. The simpler version will commonly be “more” in terms of LoC but, as you say, easier to grasp and to work with when done well. I found (the right form of) laziness a good counterforce. I don't want to wrack my brain with an over-engineered abstraction, but I also don't want to squirt the same 15 lines of code all over the codebase. When the criterion for correctness changes, I'll have to edit those 15 lines everywhere, and I'm guaranteed to forget at least one instance.

> Imagine that the requirements are that your program does A, B, C, D and E, in that order.

I find that every year I write software, the less my software looks like "do step A, then B, then ...". It's always becoming more functional and declarative. I'm not sure there's any function in my entire program that has to do 5 high-level calls in order like that.

Without hearing what A/B/C/D/E actually are, this sounds almost straw-man-ish, or perhaps architecture-astronaut-ish.

That’s a shame, I prefer reading very simple code myself.

What's a shame? Functional and declarative code is simpler.

Relevant recent discussion:

https://news.ycombinator.com/item?id=18959636 (John Carmack on Inlined Code (2014))

• This comment, which contains a useful example: https://news.ycombinator.com/item?id=18832382

Please let me know of any further discussion on the topic. (I'm also reminded of Ousterhout's advice in A Philosophy of Software Design that “classes should be ‘thick’” — see talk https://www.youtube.com/watch?v=bmSAYlu0NcY or (unrelated) slides: https://platformlab.stanford.edu/Seminar%20Talks/retreat-201... )

One thing I see people miss in this discussion is that there's a tradeoff between having an individual function be understandable (shorter functions are easier to understand) having the entire codebase or interrelationships between functions be easier to understand (fewer functions are easier to understand). When there's an example presented like in the post, it's often presented as an example with a single function, and of course four short functions may each be easier to understand than a single function, but you've also added three nodes to the “graph” (of functions and calls between them) and now you need to study whether these functions are called from anywhere. To make up for this one may start introducing classes/objects to indicate “private”/“public” functions, and so on. (Some languages allow nested functions, which can help.)

Most of all I'd say reading very different styles of code, and seeing how people achieve great things in unusual ways (e.g. with lots of abstraction, or with very little abstraction), can be an illuminating experience (see my comment in this thread about reading Knuth's programs).

You might like this related comment https://news.ycombinator.com/item?id=13571159

Thanks for sharing; that's an interesting discussion.

To elaborate more on the idea that splitting up functions introduces complexity at the level above functions: Conventionally, programming languages provide only a few levels of abstraction. For example, if a bunch of related lines of code are logically related and can be understood as a unit, one may group them into a function. If a bunch of related functions make sense as a unit, one may group them into a class. If a bunch of classes make sense as a unit, one may group them into… a file, say. (If the language allows it: some languages allow only class per file.) And if a bunch of files make sense together, one may group them into a directory. After that, I guess directories can be grouped together for a few levels; beyond that one may group them into separate packages / modules or programs / microservices / what-have-you.

These different levels of grouping feel different (one tends to have a different reaction on seeing a bunch of microservices than on seeing a bunch of functions). There are probably tendencies to avoid much grouping of certain kinds and have groupings of the kinds that an individual programmer/team likes.

What we really want though, is from the other direction (top-down, not bottom-up): we want to start with the system we care about, have it broken down into “not too many” parts (whatever that means: 3 is probably fine, 50 is probably too many), then have each part broken down into not too many parts and so on. When you break up a function that does A,B,C,D,E into separate ones, you may be introducing abstraction for the sub-parts, but you're losing the abstraction you had at the higher level (that A,B,C,D,E are in some sense one unit) and have to now make up for it in some other way: that's (part of) the cost.

(Aside: WEB/CWEB-style “literate programming” provides one way to have grouping that, by being infinitely nestable, is self-similar: you can have the same kind of grouping at many different levels, and you can organize these however you prefer: e.g. group related sections into a “chapter”, chapters into a “part”, then break the entire “book“ if it's too long (though I don't think anyone's done that) into separate “volumes”, or whatever kind of grouping you like. And now you can have less of the language-level abstraction, e.g. an entire chapter may tangle into a single 1000-line function, which now is likely to means something more—that it's called from several places, for example—than merely being the way of achieving grouping.)

As with everything, it's a question of the right balance. Which balance is the right one under what circumstances is a matter of experience. And nobody can make your experience for you, each one of us has to do it for themselves. So never be afraid to make your own mistakes (and learn from them). It is, after all, the fastest way to gather experience.

What would piss me off to no end in this Scala codebase I used to work on was the creation of traits that were then only extended by one object:

trait Foo { val x = 1 }

object FooObj extends Foo

So pointless and unnecessary. And before anyone says this is just planning for the future, the future can go fuck itself.

There's two valid use cases for this:

1. Testing. The trait makes it easier to mock a dependency. In this case the second implementation is in the testing codebase.

2. Quick overview and contract.

If you have a `trait DoThing { def firstThis() def thenThis() }` it very clearly shows what the trait is meant to do with what contract, even if the actual implementation is 500 LOC with 15 private methods.

It is this way for large projects. You write your function, the interface is tested, but the whole project becomes a big bloated mess.

I don't think there's any better way to write large projects.

Instead consider loosely coupled smaller projects.

Perhaps this only moves the complexity elsewhere and you have a dependency hell, but arguably the same can be said for versioning parts of code.

Rather than planning for the future, this is usually an artifact of problem solving and designing in the language (they don’t know only one Foo is needed when they are planning out what a Foo should do).

That makes sense if you can then write code using that trait as a more general type.


"Hell is someone else's abstraction." -Anonymous

My own code written six months ago looks like someone else's code.

... and the best abstractions are exactly the exceptions to this rule.

Clever yet true!

This is simply an issue of naming, ensuring people fully understand the problem, and having the appropriate level of granularity (not necessarily abstraction).

Consider these within the context of the A->B->C->D example he gave:




    // Wrapper that utilizes the above functions

The very first reader comment at the bottom of the article is (almost unintentionally) hilarious:

  Why not [A, B, C, D, E].map(do) ?

But it's so elegant! Just make sure they all implement a `interface DoAble { void doIt() }` and we can make a generic do!

Exactly! And all the dynamic contextualization and dependencies of each `doIt` are gathered by the standard Clairvoyance module.

However, this is still incredibly verbose compared to the pinnacle of elegance discovered as early as the 1960s:

None of those modern wimp languages has reached that level of terseness yet.

Have a cookie. You can have another one if you source your concept and cite the prior work; your handwave is useless against a search engine, which turns up this phrase only in your posts on this site.

'When someone says "I want a programming language in which I need only say what I wish done," give him a lollipop.' ~ Perlis

[0] http://www.cs.yale.edu/homes/perlis-alan/quotes.html

I’d bet good money the humor was quite intentional.

Re-labeling and re-arranging actions in a sequence seems like a pretty shallow example of "abstraction."

OTOH, if I used generics or such people would argue about merits of generics and ignore the main point.

ironically, I think the example might be a bit too abstract, but it got the point across clearly (even if I don't completely agree).

Spent some time (slightly) re-abstracting a bunch of code yesterday. I started that particular project trying to follow some of the anti-abstraction gospel that came up on HN in the past months. Expecting new people to join the effort sometime in the future, it seemed to make sense. But certain things just ended up being spread out too broadly over time. I worried a newcomer would miss some related thing that I purposefully left unrelated.

Having read this thread, I really hope that this kind of (comparatively shallow) architectural problem is something AI can help with sooner rather than later. Or at least the onboarding process to a new codebase could be tackled by some smarter tools / better IDEs. Seems like there’d be some real value in that.

This is in line with the other day's post on gameloops and inlining functions from carmack.

IMO one should always strive for simplicity and clarity. And if it is critical code that needs to be secure then there is no excuse for unecessary abstraction.

Good abstraction is what gives you the simplicity, clarity and security. Abstraction is great to talk about things precisely.

Exactly. IMHO abstractions on the problem-domain are good. But then there are these kind of meta-level abstractions, like factories, proxies etc., which are usually just making matters complicated.

IMHO the "casual" reader should be able to understand the body of one function in isolation ; i.e, she shouldn't need to to dig through other functions (callers/callees).

This is exactly why all functions should be kept small, and thus it might be preferable to extract small functions.

Proper function naming is critical here, but it works : when was the last time you needed to look at printf's source code to understand your own code?

Same thing for max/fopen/malloc/atoi/scanf/sqrt ... if it works for standard library's functions, we can make it work for our own functions.

Standard library functions are well understood because they are shared between all projects using a language. Their naming doesn't even need to be descriptive, because people will learn them regardless.

But functions which only exist in your project are a completely different beast. It is often very difficult to capture what they do in the name, even when you try to - especially for the higher level functions, which do a whole lot of things (by calling other functions). And, even if the name was originally good, it could become misleading when new use cases are added later - I found this problem very often when working on legacy codebases.

Using lots of small functions isn't a panacea, and is completely different from functions in the standard library.

A good anecdote about this: I once wrote a file_lock function for a company that took about 15 minutes to complete. The function name was something like "file_lock". But then, it had to be modified to work on Mac and Windows. Then again, had to double re-entrance from the same thread. Again, it had to be modified to lock files for network files, FTP files, you name it.

I worked on that initially tiny function on and off for two years. Anyone just reading the function title and comments along the way could easily assume it did things that it didn't.

First of all I agree with the top (currently) commenter that the examples the author gives are not really abstractions, at least not of the kind I thought this would be about.

Anyway, my thoughts on the subject:

Ofcourse the problems with abstractions is when they leak; when you read an unfamiliar codebase for which you do (not yet) understand the abstractions and reasons behind them, you brain will be leaking cases all around them. There are ways people try to prevent that; 4gl, xml/json ‘definitions’, DSLs etc but unless you actually deliver the ‘runtime’ or libraries (classes (sealed etc in C#) and frameworks) as binary, your new colleagues will look under the hood and probably cry until either they get it or find it is actual crap.

In the end it never really works if you go broad (let too many people look under the hood) because people want to/have to break out of the abstraction and start to abuse the leaks. And then the result will become an absolute mess.

I did a lot of successful projects with very high level abstractions but they all rot when another team takes over. That is normal for most code imho though; I measure success in decades not years and I have stable ancient code running in many companies/online which use the abstractions and the shielding to keep those transactions from leaking for all those years. Yes, it limits what you can do with it, but if that is a given and the business case holds up and management is not swayed by ‘but I read now we need to use something called React/Redux’ then there is a lot of mileage to be gotten out of effective abstractions.

Abstractions need proper names and meaning. In this example the "abstracted" version would be a lot simpler if the names of the functions would be 'doB()', 'doC()' etc. and it is still hard to make sense of as A, B, C, D, E as an example doesn't really have an obvious meaning. Just an order. But if the main function describes a process that must happen in a specific order, then by all means keep the code in the single function if it makes it more clear.

Agreed. I personally find it very difficult when abstractions are named after the design pattern rather than the purpose. E.g

  class B
  import com.acme.b.BRepository
  class BImpl extends B
  b.process(); // this is all we want, doB()

My favorite abstractions:

~ == home directory . == present working directory .. == parent directory

Names are abstractions. If there's an itch to apply some abstractions on a project, begin with good, very carefully considered names for functions, variables, and classes.

The main difference between the projects that make me want to pull my hair out and the ones I can actually understand and immediately begin working with is simple: good projects use good names for things.

This is why I think a language with refinement types (and maybe dependent types) as well as an effect system would be great as more could be inferred from the function signature. You would still need to go down the rabbit hole of abstractions a bit but at least you know what you're getting into and if the abstraction is useless.

This isn't code abstraction, just splitting up long functions. Obviously if you name your split functions "foo" and "bar" then it hampers legibility. But if you name them "doC" and "doBCD" then you can read them fine.

As far as first impressions go, I would much rather see all the noun entities in a system abstracted into classes; and verbs implemented as methods.

Well abstracted, fluent code is self describing.

  new Invoice().withParam(param).generate().deliver( person );

The author doesn't go into enough detail about the nature of the blocks of code (how big, what they do) as that will most certainly have an effect on the decision to break it up into functions.

I think if A, B, C, D, and E are mutating functions then yes, it may be sillier to break them up if they are all contribute to a core, almost atomic mutation. However, if they were functional, breaking them up may actually make sense and be more readable.

  let a = A(x)
  let b = B(a, y)
  let c = C(b)
  let d = D(c)
  let e = E(d, z)
Then you can unit test each function and read each one in isolation to understand what transformation is being done on the data.

That abstraction is trivial. The real problems are the unintelligible ManagerConfigContextInjectorProtoFactory style classes that seem to proliferate in big corporate projects.

I wrote an academic paper on the "Costs of Abstraction" in 2004 -- I recommend interested parties check it out!


This reminds me of Laravel their codebase sometimes feels like a maze even with a very decent IDE. Too many abstractions, and facades makes it even worse.

Edit: Baklava code is a good term for it


Abstractions obey the same hierarchy as relationships: great abstraction > no abstraction > bad abstraction.

About abstractions, even the best non leaky abstraction may begin to leak over time as the reality it originally ( and successfully ) captured has moved on.

This is just a price of doing business, abstractions should be maintained like anything else.

This isn't even abstraction, it's just encapsulation of code into functions.

If you're fundamentally against that, I pity anyone that has to attempt to read and understand anything any bigger than a toy example.

ALL programming languages are abstractions. That's why we need compilers/interpreters to make the source code have an impact on the world.

It's hilarious that so many people are pointing out that this argument against abstraction would not be convincing if the author had used good concrete function names.

Looks more like indirection than abstraction to me, but then again someone could say that abstractions are just indirections one likes.

Recently, I've been reading the TeX program by Donald Knuth, and some related programs. One the one hand, they are intended to be read (written using the “literate programming” style he developed), so they ought to be easier to read than typical programs (and maybe they are, compared to circa 1980 Pascal programs of similar size). On the other hand, I… struggle.

In these programs, Knuth uses very few abstractions. I think there are two reasons for this:

* These programs were written in the “standard” Pascal as of 1980, a language which had many limitations (see Kernighan's essay on the topic) — and on top of that, Knuth wrote these programs to be as portable and memory-efficient as possible, across the many (often poor) Pascal compilers that were available to typical users at typical computer installations.

* To some extent, I suspect it is just Knuth's style and preference. As a programmer who started his career writing assembly programs in his own style (a story from 1960: http://ed-thelen.org/comp-hist/B5000-AlgolRWaychoff.html#7), even when given a supposedly “high-level” language, he probably thinks very close to the machine. Even as a mathematician (a profession often believed to be fond of abstraction), his style is characterized less by abstraction-building and more by deep attention to detail and “just do it”: he calculates constants to 40 decimal places; he does not stop at analyzing Big-O complexity (despite having popularized the notation) but finds the actual constant factor; he does not analyze on an abstract model of computation but writes assembly programs for a specially-designed mythical computer and analyzes them; and so on.

Nevertheless, his literate programming seems to be a way of solving the problems for which abstraction is often used as a solution (comprehensibility, etc), without introducing abstraction (and paying its cost): you can have functions that are hundreds of lines long, but presented in understandable (dozen-line or so) chunks.

Here's an example from something I was reading today. Unfortunately to make sense of it you need to be somewhat familiar with the conventions of Pascal, of WEB, and the current place in the program, and turn off your instinctive horror of non-monospaced code and the weird (for today) indentation. (Some background here: https://shreevatsa.github.io/tex/program/pooltype/ and https://shreevatsa.github.io/tex/program/tangle/) Anyway, the example is this: see sections 53 onwards: https://shreevatsa.github.io/tex/program/tangle/tangle-6 — here you have a single function `id_lookup` that is hundreds of lines long and has very complex case analysis and code paths, but chopped up in a way that makes it less forbidding to understand.

I think though that programmers today have a strong cultural bias in favour of abstraction, so they tend to have a strong negative reaction to code like this.

Literate programming is ALL ABOUT abstraction! Every named section represents an abstraction. E.g. look at section 53 in the code you link to:

    begin l←id_loc−id_first; { compute the length }
      <Compute the hash code h 54>;
      <Compute the name location p 55>;
      if (p = name_ptr)∨(t = normal) then
        <Update the tables and check for possible errors 57>;
What are the references to 54, 55, and 57, if not abstractions?

I agree that Knuth style literate programming uses a quite different abstraction style than programmers are used to, and it's difficult to adopt and do well. But it IS using abstractions, and very extensively so.

Thanks for reading. I understand what you mean, and I agree with you that what you identified is in some sense what literate programming is all about. I think debating whether that's precisely what is meant and is discussed as “abstraction” probably gets into motte-and-bailey (https://slatestarcodex.com/2014/11/03/all-in-all-another-bri...) territory: by defining “abstraction” sufficiently narrowly or sufficiently broadly, nearly anything being discussed in this thread can be said to be abstraction or not. For example, while here you're pointing out that giving a name to a bunch of consecutive lines of code is indeed abstraction (and I'm inclined to agree with you), currently the top comment on this thread argues that even splitting code into functions isn't “real” abstraction (https://news.ycombinator.com/item?id=19011659).

So instead of trying to define “abstraction”, let's try to discuss the tradeoffs: I think what we're discussing is abstractions that are “formal” and “strong” versus those that are “lightweight” and “flimsy”. Look at the original example in the post:

    void main() {
        // Do A.
        // Do B.
        // Do C.
        // Do D.
        // Do E.
One could argue that here too there is “abstraction”, via the comments: they identify and document some related lines of code as doing something. When we move to the rewritten example:

    void bar() {
        // Do C.

    void foo() {
        // Do B.
        // Do D.

    void main() {
        // Do A.
        // Do E.
what's changed is that these abstractions have become more formal, more solid: grouping code into a function (I just left a comment about this: https://news.ycombinator.com/item?id=19012979) (instead of just with a comment) is a form of abstraction provided by the language, with stronger guarantees, such as that foo() cannot use variables that occur internal to bar().

Literate programming is somewhere between the two. The sectioning it provides is not a language-level construct; it's just textual (macro) substitution, and if you wrote something like that in C++ with #defines, you'd probably get yelled at for doing something dangerous instead of the abstractions (like functions) that the language provides. The literate-programming sections are very “flimsy”: code in one section can and does use variables defined in another (note all the function's variables are defined in section 53, and the function also uses global variables like `buffer` and `hash`); there's a “goto found” in section 56 where the label “found” is defined in section 55.

Abstraction, beyond just naming/chunking, involves bringing into existence some new entity. The more “formal” this new entity is, the more you have to worry about things like: its place in, and relationship with, the rest of the system (e.g. what are all the function's callers); its interface/boundaries (under what circumstances does it make sense to call this function); how much of a clear meaning it has, and so on. With just a named section, there's less of that. The same things that make the latter kind of abstraction “flimsy” also make it have less cost.

I agree that literate programming chunks can be very leaky abstractions, and it's a good point that LP systems provide no guarantees whatsoever about interfaces. But for precisely that reason, LP chunking sometimes is useful in situations where procedural abstraction is limited: In algorithms where the various sections are connected with very wide interfaces. If all you have are procedures, you either pass around a big struct, or an enormous number of procedure arguments. In object oriented programming, the big struct is implicit in the self/this pointer. LP gives another alternative.

Right, I agree; that was exactly my point too. :-) Maybe I worded it poorly initially: The original post was about the cost of "heavy" abstractions (functions), instead of something lightweight like comments and naming. (In searching for "the cost of abstraction", I just now found another post by the author: http://250bpm.com/blog:86) I agreed with this, and gave literate programming and the programs of DEK as an example (an alternative to “formal” abstraction): a prominent programmer finding ways to achieve things with only chunking/“lightweight abstraction”.

(BTW, the programs I mentioned also eschew other sorts of abstractions like abstract data types, e.g. all memory is laid out in huge arrays and manually managed — it seems to work for him. But that may have been a constraint of the language for all I know.)

A quote I heard recently:

Any problem can be solved with another layer of abstraction...

This sounds like a modification of David Wheeler's quote:

    There is no problem in computer science that can't be solved using another level of indirection.

And we also have the modification of Rob Pike:

"There's nothing in computing that can't be broken by another level of indirection."


Yes I think that was it. Makes me rethink every abstraction I add to solve a problem.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact