Hacker News new | past | comments | ask | show | jobs | submit login
Abstraction is expensive (specbranch.com)
224 points by pclmulqdq on Dec 7, 2022 | hide | past | favorite | 139 comments



Lack of abstraction is also expensive: try writing something large in assembly.

I'd say that lack of a language / system of notions adequate to the subject area is expensive. The desire to describe things in a way that's efficient for a particular class of problems leads to invention of various frameworks. Say, Rails makes you hugely productive at solving a particular type of problems (see [Shopify]), though it's less than helpful if you try to apply it to unfitting problems (see [Twitter]).

[Shopify]: https://tomaszs2.medium.com/how-shopify-handled-1-27-million... (Sorry for a Medium link)

[Twitter]: https://www.theregister.com/2012/11/08/twitter_epic_traffic_...


I wrote the piece. The argument I tried to get across is: "You must use abstractions. Find the ones whose values align with your interests."

I find people frequently torturing their problems into abstractions they like rather than finding abstractions that work for their applications, and this is very much wrong.

Edit: By the way, the examples are fairly low-level databases because that's what I know about. If I knew about web frameworks, I would have used those instead.


One point about abstractions that you might want to make in a future article is that developers tend to discount the cost of all abstractions that they have internalized. Therefore when they get to a new environment, they immediately try to recreate and incorporate the abstractions that they used previously.

However these abstractions are not free. They are not free for performance. They are not free for debugging. And they are especially not free for any future developer who is not familiar with them.

The joke I grew up with was, "Andy giveth, and Bill taketh away." It is no longer Intel under Andrew Grove who gives us better performance. It is no longer Microsoft under Bill Gates who gives us slower software. But the basic phenomena is still true. Hardware improves over time. But developers happily introduce abstractions with no awareness of the associated costs.


Hard agree. On the flipside, devs also tend to overestimate the cost of already-implemented abstractions that they haven’t yet internalized, leading to complaints and lengthy refactors when they could have just spent the time to learn the abstraction…


You wrote: <<They are not free for debugging.>> Thank you. Finally, someone acknowledges this important point! I have literally been told that adding extra local variables was "increasing stack size; must be slower". But what about debugging? And, yes, there is a balance to be had between maintenance and speed. Example: Breaking down a critical path section of code into a bunch of tiny non-inline functions may slow down the code unacceptably. In other cases, it may make the code much easier to read, maintain and debug... so the trade-off is acceptable.


The same can be said about organization. Too much abstraction, managers and teams, is harming performance. Too little abstraction is hard to coordinate. Putting abstraction in front of problem, throwing problem to previously arbitrarily setup team with or without the right mix of expertise for the current problem. Spinning up libraries/framework automatically when starting a new project, forming teams that worked for previous company. Abstraction that's so leaky that either require code change to the library or "octopus" code that link up functionality from 7 different other libraries, team formed without regarding a problem boundary where each problem need cross team coordination of seven department. Software engineering and organization design have very much in common


The material is good, nice going. But the title is misleading.


I would argue that abstractions are usually expensive.

It's just that sometimes not doing something expensive is also expensive. Getting your car regularly maintained is expensive, so is not doing that.

The issue with abstractions is that people have not internalized them as "an expensive thing we do to prevent ourselves from running into expensive problems later"; they're internalized them as a zero/low-cost process that has no downsides and thus should be pursued all the time.

It's like if people said, "not getting your oil changed will be more expensive, and therefore I get my oil changed every other week."

I kind of think that the only reason that the phrase "abstractions are expensive" sounds like a controversial take is precisely because people have not internalized that abstractions are not a binary good/bad thing and that they should be applied situationally, because they do have maintenance costs and development costs.


That's very well-phrased, and you beat me to this comment. :)

I wanted to get across that an abstraction is an expensive thing you should use to solve a difficult problem, not a cheap thing you use to solve a simple problem.


> abstraction is an expensive thing you should use to solve a difficult problem, not a cheap thing you use to solve a simple problem

Nailed it.


Sure, the statement isn't wrong. But the content is mostly about making good/consistent choices given the situation.


> But the content is mostly about making good/consistent choices given the situation.

That's what I'm saying though. You need to make good/consistent choices about the given situation because abstractions are expensive, not free. If they were free, we'd just throw them everywhere with no thought.


What you are saying isn't wrong, but when someone reads "abstraction is expensive", they immediately assume the implication "and hence you shouldn't use it".

To make an analogy: people shouldn't buy a sports car if they want to take their family of 5 skiing every weekend. A person might reasonably write an article on all the factors one should consider when buying a car. They shouldn't title said article "Cars are expensive", even though the reason they wrote the article is because cars are expensive and hence the choice is important.


> but when someone reads "abstraction is expensive", they immediately assume the implication "and hence you shouldn't use it".

I suppose? But if someone's attitude is that expensive things should be universally avoided, that seems like a problem that's going to need to be addressed sooner or later.

Would we be having this conversation under an article titled "abstractions have benefits"? Would we be worried that someone is going to look at that title and think, "I should use them everywhere all the time then"?

I don't know. I'm not against phrasing something in a way that minimizes confusion, but on some level I think that internalizing that programming concepts aren't binary good/bad is arguably one of the most important lessons that a programmer can learn. And I regularly see a kind of pushback to the (correct) notion that abstractions have costs that I don't see in other contexts. But that could just be, and my experiences might be different than other people's.

This is largely a subjective take by me, so I understand if people disagree with it, but the prevailing attitude I personally see in software development is one where people do not realize that there is a cost to abstractions, and in fact sometimes bristle at the idea that they do have costs. At most, I can get people to agree that 'bad' abstractions have costs, but it is much harder to get them to say, "it's possible to abstract too much, and even good and necessary abstractions are still additional code with additional costs."

So I think the notion that it is not always correct to abstract every piece of code is weirdly controversial -- and the underlying idea behind that that I think people haven't internalized is "sometimes good things have a cost, and we should talk about the costs that they have."

But again, could just be me. If a bunch of people are confused over the title, then... I mean, I can't tell people what they should and shouldn't be confused about. If it's better to communicate with them in a different way, then it is what it is. It just worries me if on a programming site so many people interpret "is expensive" as "should never be used." That's not a good programming philosophy for people to have.


You shouldn’t use any abstraction unless you have a good reason to do so.

Why? Because it has costs which are most heavily paid by those who come after you and read or modify the code later.

A small amount of the right abstraction helps understanding, too many irrelevant abstractions just gets in the way (BeanFactoryFactory etc).

I’ve only seen people use too much abstraction, not too little, so the title makes perfect sense as it is to me and has the right message IMO.


Thank you. I will add that the title is intentionally provocative. However, I want to encourage developers to think of an abstraction as something you pay a significant cost for rather than something you get for free, and use that logic to make good choices.

In a sense, you hire abstractions a little like you hire employees. Interview them and make sure your values match.


An aside - if you haven't read any Michael Porter, you might find it interesting.

https://iqfystage.blob.core.windows.net/files/CUE8taE5QUKZf8...

Slightly different concept, but similar ideas around consistency given a set of explicit choices/constraints.


"abstraction in expensive" makes people think you are implying one shouldn't use abstraction. It's all well and good that the content doesn't say that, but as journalists say: "don't bury the lede".


Nah, it’s perfect. Provocative, technically true, and when you come back to the title it’s “I have a new insight into this now”.


IME the best abstractions hide the underlying complexity by default, but allow you to "pop the hood" when necessary. The more this exposes the guts of the underlying implementation details, the better (though obviously that comes with tradeoffs around changing underlying impl details, although I think as an industry that we over-index on that too much, which leads to exactly this problem).

Go's compiler intrinsics immediately come to mind here. They allow you to easily write Go functions in native assembly, without CGo, for hot code paths. We get the benefits of a high-level language with a fat runtime, but can easily drop down to assembly when we need to.


In my opinion good abstractions layers should also have a way to peel things back like layers of onion.

Taking the authors example of a TCP network stack you often can't do though since OS won't let you have that low level of access by default without using or writing some custom driver since the OS ends up trying to isolate usee-space completely.

Kinda makes me wish more research has been done on things like exo-kernels where the OS is mainly concerned with security and not the abstraction. All the abstraction runs in user space on such a kernel and you can choose what level is suitable for what your doing. https://en.wikipedia.org/wiki/Exokernel


The headline is sort of misleading. It is (appropriately) primarily about inappropriate abstractions. As you say, some abstractions are unavoidable. Even assembly language is an abstraction.


I think arguing that assembly is an abstraction is going the wrong direction on why it is hard. Assembly largely forces numerical abstraction on your problem. Which is a lot harder to reason about than folks want to acknowledge.

That is, higher level languages let you get farther away from the abstraction that is the computer itself. As such, you can have a less abstract program in a language that has higher abstraction away from the execution environment.


I pick only the best artisinal integer representations for my programs. Only fools use abstractions.


> Even assembly language is an abstraction

Is this true? I thought assembly mapped 1-to-1 with actual HW instructions. If so, assembly wouldn't be an abstraction, it would be an interface.


>I thought assembly mapped 1-to-1 with actual HW instructions.

Assembly encodes a wide varity of abstractions (it is a human readable format after all) and lots of assembly instructions have a clear relation to an instruction on the hardware, but definitely not all. E.g. a CPU does not understand what a "label" is and the semantics of a labels and jumping to them is removed by the assembler.

But not even the assembled binary actually maps to executed instructions. The CPU is actually a virtual machine which presents itself as e.g. an x86 ISA interpreter, but internally it uses microcode executed in various performance enhancing ways to speed up the process.


First of all, you normally use symbolic labels, not offsets for jumps; the assembler will calculate them for you, and a linker will possibly build a relocation table based on them. Then, any good assembler has macros. Also, data / text blocks, etc that are not code but an abstraction which the linker later uses.

Writing machine code directly, as a byte stream, is fun, but is exhausting.


Arguably it's more of translation layer not abstraction. You're not abstacting away any concepts or code blocks here, you still have to write every single instruction, you just get a bit of a help with math.


Assemblers like nasm have an entire macro language. Of course that isn't part of the ISA. But in the end even the ISA is fiction.


Nope, the CPU presents you an interface of the available instructions but those instructions are a complete fiction on modern processors and don't have to match at all what the underlying hardware does. The CPU guarantees that the observable effects of the instructions are consistent (i.e if the processor wants to do some fuckery it can't change the instruction semantics) but beyond that is free to run your code however it wants.


Simpler RISC designs, like simpler ARM cores, more or less directly execute instructions; same for old 8-bit cores you can still widely find in MCUs. Complex high-performance cores, with pipelining, instruction fusion, and OoO execution, turn instruction stream into a microcode lava.


So maybe the right thing to say is that assembly is a contract between you and the processor and the underlying implementation can map directly to hardware when appropriate but it doesn't have to. It is an abstraction over different underlying hardware implementations.


It is true. Abstraction means that you dont have to worry about some lower level concerns. With computers, the rabbit hole goes quite deep.

High-level language (C/python) Assembly language (att/intel) Byte code Micro code

So, with assembly code, you dont care about what the actual bytes that get generated are (they can change and you'd never know if the system was backwards compatible). Similarly, if the microcode that the CPU generates to implement the instruction changes youd also never know.


> I thought assembly mapped 1-to-1 with actual HW instructions.

on x86 it's a 1-to-many relationship. There are, for example, many different MOV instructions that can be encoded based on the parameters used. All assemblers I know of hide this from the user, as well as featuring labels, macros, etc. which are not defined by the hardware at all. For the most part, the x86 assembler hides the details of prefix bytes and ModR/M+SIB from the user. Some assemblers are quite advanced and if you took them just a few logical steps beyond where they are you would end up with C.


> Is this true? I thought assembly mapped 1-to-1 with actual HW instructions. If so, assembly wouldn't be an abstraction, it would be an interface.

Assembly gives you a nice sequential list of instructions, completely hiding pipelining, speculative execution, and all the magic that modern (since the 90s at least) CPU do to do the job fast.

And as the Spectre family of CPU vulnerabilities, these abstractions are in fact more leaky than most people assume.


And cache management, thread switching optimization/prioritization, and so on.

If a CPU were to ONLY do the instructions that were written in the assembly code, the code could be maybe 5-10 times slower when using cached memory and 100x slower if accessing RAM constantly.

And this is part of the reason why it's hard to write assembly that is faster than a well optimized C/C++ program. The C compiler "knows" (to some extent) what the machine code leads to at the hardware level, and will often create machine code that is more liklely to allow the CPU to reap all such advantages in a way many assembly programmers wouldn't know about or think of.


If you're not manually setting voltages on different pins then at some level it's an abstraction.


This is true. Ignoring the labels, macros, and directives, there are many ways in machine code to encode most common x86 instructions (zeroing a register even has many possible assembly instructions!). Assemblers pick the best encoding.


Is that true? They swap actual instruction mnemonics as written in the assembly source into different instructions, for performance? I hope that is controlled by some flag or something, seems like a strange thing for an assembler to be doing unless asked.

I seem to remember the venerable combined assembler/monitor/editor ASM-One [1] on the Amiga having a mode to do that, an "optimizing assembler", but I think it mainly worked with instruction and data sizes, i.e. optimizing short jumps into branches which were cheaper on the 68k.

[1]: https://en.wikipedia.org/wiki/ASM-One_Macro_Assembler


I believe that there are some assemblers that will swap "mov rax, 0" for "xor eax, eax" (which is smaller and faster), if you let them, but not all of them. Some instructions like "jmp LABEL" correspond to many different options (short relative jumps, long relative jumps, absolute jumps, etc.) and the assembler picks the best one.

As another example, almost all of the vector instructions have an encoding with a VEX prefix (an AVX encoding) as well as the older SSE encoding. If you mix the VEX and SSE encoded instructions, there can be a big slowdown, so an assembler will give you VEX encodings if you have AVX-only instructions in the stream, and default to SSE encodings if you don't (they are often smaller).

Some instructions, like LEAs and ADDs, have several different encoding options corresponding to different operand orderings, and an assembler will pick the best one - some of these encodings will force an extra SIB byte when you use R12 as an operand, for example.

This is kind of assembler-specific in terms of how smart it is. I'm not sure that the dumber assemblers do this for you.


That's why real programmers write binary object code directly.


Don't tell them about the lost art of loading microcode on boot.


Binary is an abstraction. I tap the clock and data lines line manually with a battery. Boy are my fingers tired.


I wire my magnetic core with artisanal copper.


You use copper? These kids today...


Relevant xkcd:

https://xkcd.com/378/


Dammit, emacs!


Ah yes, the All Abstractions Are Equal fallacy.


ruby is a poor abstraction AND inefficient. text based manipulations and crap like IFS are terrible abstractions. languages like standard ML were made SPECIFICALLY for abstracting. they literally were meticulously designed to provide proper abstractions (think, for example how the elements of a union type are abstract objects, and can be built up to model coherent ideas, while in ruby any variable probably requires a lookup at runtime in ascii^Wunicode^wunicode-minus-whatever-feature-was-found-to-be-harmful-in-this-context)

what im saying is that the academic idea of abstraction is far better and less leaky than ruby (dynamic name resolution and such "scripting language" crap) or java (data structures are intended to be used once or twice per program (exaggerating) instead of fluently like you would make trees in a functional language); its just obscured but if everyone understood the motivation it would have replaced stuff like ruby in all use cases.

i see ruby and java as a little shop where the owner will let you instantiate one or two objects according to some terms, and then let you maybe make your own class, but it will essentially just be a wrapper around a few of the shop's objects. whereas in ML you can model anything from the ground up, even numbers, and it will still be nicely implemented and the abstractions will be almost perfect.


Right. Abstraction has cost. It may or may not be expensive. Know the cost and then decide if it's worth the cost.


"Lack of abstraction is also expensive: try writing something large in assembly."

What about writing something small.


And then using that small thing as a subroutine that you can call into to repeat that same functionality...


Lack of abstraction is also expensive: try writing something large in assembly.

I can't think of a single "best practice" abstraction we have in web development are anywhere near as watertight and useful as what C offers over assembly.

React VS The DOM is more like CFront over C.


One of my favorite quotes about software engineering, sorry not sure who to attribute it to:

"First you learn the value of abstraction, then you learn the cost of abstraction, then you're ready to engineer"


Junior, mid, senior? Explains why the mid phase is brief.


I like abstractions when they are transparent enough that it's easy to tell how it could be implemented one layer down.

There might be many implementation details that you hide under the abstraction, but if the interface is so abstract that I can't envision a straightforward implementation of it just based on the interface, there's probably something wrong with the abstraction.

Additionally, if the behaviour of the implementation conflicts with the simplified model communicated by the interface, that'll also cause issues.


Similar to liking them for being easy to see what they mean one layer down, it is also nice to know what they mean one layer up. Your program has to inhabit a middle layer between what it is you want, and how it is that it will be executed.

Sometimes, we can get lucky and a declarative statement of what we want works. Often, that isn't the case.


I think its a meme now to hate abstraction. In big tech I see it as a justification to build parallel systems that do almost the same thing. Abstraction done well is where technical leverage comes from. Abstraction done poorly is where technical debt/bad technology comes from. Management prefers the safe route of gating their engineers from abstraction to get the work done with certainty, the trade off being less technical leverage. Its a corporate engineering meme to pin tech workers into being replaceable cogs pumping out low abstraction widgets


> Abstraction done poorly is where technical debt/bad technology comes from.

I agree, but it is not the only source of technical debt. I tried to work on a batch job that ran a company one time, it was just a top to bottom script, if-statements, loops, etc... nested 17 (!) deep in places: an if inside a for loop inside a while loop inside an if inside an if inside a while loop inside an if inside a for loop inside a while loop inside an if, etc. The guy who wrote it didn't know what a function was.

Believe me, you can have technical debt even without abstraction.

One of my frustrations is that it takes a lot of skill to rise above "big pile of if-statements" programming, and it's so easy for a good abstraction to crumble back into a "pile of if-statements" if even one developer working on it doesn't understand the abstraction.


I wish it was a meme. The things people complained about decades ago are still happening today, and the practitioners are all too eager to turn a 100 line program into a 1000 line one, while taking weeks to test if it does what it should.

The average loud dev is a GoF fanatic in love with inaccessible reflection. Double points if their tools are still stuck in a time where stack traces weren't extended to deal with such practices better.

And with everything being sliced into pieces: the fun has only just begun


Call me blunt, but ive come to the conclusion most people arent intelligent enough to create complex abstractions


Call me blunt, but this statement feels about as hard-hitting and impactful as someone shouting in a crowd that they like pastrami sandwiches.


Every person have a limit on how complex abstractions they're able to properly create. Then there is another, usually lower, limit for what kind of abstraction they're able to understand well enough to make use of it.

In a lot of cases, people try to leverage abstractions above that second level (for them), in which case they're simply imitating in a cargo cult manner some abstraction that will often not work well if not properly understood.

This is where the atrocities start, imo.

And the sad part is that many of these programmers do have a fairly good understanding of the business needs of the stuff they're making, and often those things are quite simple from a technical perspective.

Programmers that might have been able to create perfectly usable apps in VB or something similar 20 years ago, are now only churning out useless garbage in some moder cloud based stack. (Or, at bes,t are using 80% of their time struggling with their tech, while with a simpler setup, they could have spend only 20% on the tech and used 80% of their attention on addressing business needs.)


I don't think intelligence or lack thereof is the problem. Rather, it seems people are paralyzing themselves by enjoying the endless discussions, stakeholder indecision, or boredom. All leading to these grotesque things which lack words in the right places, but have plenty in the wrong places.

The average web dev CRUD job isn't interesting enough to warrant that much intelligence.


Intelligence is definitly a factor. In fact, more often than not when I encounter a person or team that seem to spend their time on everything else except addressing their most important problems, the reason for the apparent procastination is that they simply do not even know how to START solving the problem.

This is not exclusive for developers. It happens in management, too. Which is a second source of the problems you describe.


my experience is quite the opposite, it's much more difficult to create a simple abstraction as opposed to an overengineered and/or overgeneralized one.


complex != overengineered and/or overgeneralized one

some technical problems are actually just complex


sure, but it's easier to solve a complex problem by building an abstraction that is more complex than the problem requires, that to build an abstraction that is as simple as possible (given the complexity inherent to the problem). then, once you have your too-complex abstraction, it can be difficult to restrain yourself, especially as an eager junior developer, to not take your amazing feat of abstraction engineering to its logical, generalized conclusion... or at least die trying.

"see, now, every time I want to do x, all I have to do is y and z, instead of a, b, c, and d" is an incredibly addictive drug! (I write this as a recovering addict.)


right, but my point is thats what we call a bad software developer. and most software developers are warned, because real abstraction is beyond their capability. a world class software engineer knows that maxim, but also is capable of building an abstraction that lasts.


I'm certainly not. I try to compensate when I can by aggressively rewriting until an abstraction stops breaking things and feels natural


I blame GoF for giving us the two worst decades in software. But in my experience, the average loud dev has mutated into the TDD + DDD fanatic. These insufferable simpletons won't only criticize your end result when it doesn't align well with their preferred abstractions but also tell you _how_ to reach that end.

Something is also brewing with functional concepts and category theory trickling down... and it doesn't smell good.


GoF = Game of Failure?


Gang of Four, as the authors of the Design Patterns book are known (and by extension the book itself).


Oh right, that book is so old, I forgot that it ever existed ...


I find this seemingly trivial insight very useful:

Everything has costs and benefits, and they need to be weighed against each other.

If you look for it, you'll see tons of arguments like "this has costs, so it's bad" and "this had benefits, so it's good".

Both are missing half of the analysis!


Absolutely. Most political "debates" suffer from this.

"Everything is a trade-off" is similar - trivial yet profound.


I've often described the type of Engineering I do as "the art of compromise."


Abstractions are inevitable in software so the important thing is to think about, recognize, identify, and manage them. Sometimes it's appropriate to "unroll" bad abstractions in a codebase.

quotes I live by (Mostly accurate from my memory):

Repeated code is better than the wrong abstraction - Sandi Metz. Always know what the abstraction is, its value, and its cost - Kent Dodds.


ScyllaDB is, ironically, maybe one of the worst examples the author could have come up with for "abstraction" in the article.

If folks aren't familiar with their work/internal tech, go check out some of their repos like Seastar. They have some of the most talented systems programmers on the planet writing thin veneers over kernel and hardware API's to squeeze every ounce out of performance.

https://github.com/scylladb/seastar

You want to talk about getting nerd-sniped to work somewhere if you're into performance/low-latency systems and databases =P

I know it's beside the point, but I just had to share because I thought that was funny


One takeaway for dealing with what the author calls misalignment seems to be that abstractions should be reversible, that is, easily rolled back to the base layer, without disrupting other parts of the system. This facilitates swapping out one abstraction for another.

Another way of putting it is that if you lay out all the dependencies in a system, it should look more like a tree than a graph, with the abstractions closer to the leaves than to the root.


Path dependency makes that often difficult / impractical.


The problem is that application performance doesn't only depend on the thing you want to do, but also depends on the low level implementation details and workload, which the programmer may not fully understand or in control of. Some operation may be optimal when executed alone but exists better alternative for batch processing. Some programs may be optimized for throughput but is really bad for low latency application. The designer have to consider these when designing their system, and compilers/library writers can't help much except write a more comprehensive documentation which the programmer will probably not read :).

I think abstraction guides implementation, so it is very beneficial to think about what will be the bottleneck in the application, and if a certain abstraction will hinder the performance or prohibit future optimization. And although a low level abstraction can in theory allow for better performance if the programmer spend enough time to optimize for it, they often won't and a simpler interface (with sane implementation and perhaps escape hatches for performance tuning) may allow for better performance with much less effort.


> I think abstraction guides implementation, so it is very beneficial to think about what will be the bottleneck in the application

I think this way, but in terms of the "performance" of writing the application. What makes it hard to write this program? For example, in high-performance computing, you may need to control memory alignment (for SIMD and cache reasons). That makes it harder to write a program that just implements an algorithm.

So I think you should identify what it is that makes the program hard to write, and pick abstractions that help with that, that give you easier ways of controlling or managing that hard part.


I think many misunderstand/misinterpret abstraction with magic. Abstraction is naming things that share common properties and structures.

Magic is expensive because it adds debts in debugging.

Non-abstraction is also expensive because it adds debts in development and refactoring.

Good abstraction pays later, it reduces debts, but you need to buy it first. It's not expensive if it's well produced and consumed.


The most expensive abstractions are usually do to bad layering, leading to abstraction inversion. This is the phenomenon where a layer of the system ends up reimplementing the abstractions of a lower level that have been imprudently hidden by a layer in-between. This is why considering all layers together needs to be a constant consideration.


I hit that when working with NHibernate (an early ORM in C# that was a port of Java's Hibernate.)

I found that it was easier to write the ^%$#$^ query myself than to deal with that monster. What was crazier was that the senior members of the team were so afraid of a database that they couldn't conceive of writing a SQL query.

I got stuck on trying to store an enum as an int in a column for about 3 days. Eventually I decided to read the book; and walked away realizing that it was only really useful for extremely complicated schemas. For simple schemas the learning curve was so high that writing boring SQL code was easier / faster.

(Now, Microsoft's entity framework is often "good enough" as an ORM, and has a rather gentle learning curve. It helps that C# has a way to express SQL-like queries on a collection of objects and translate them into SQL.)


Reading the article and most of these comments, I've concluded that no two people are using the word "abstraction" exactly the same. I don't really understand what the author's definition of an "abstraction" is. He seems to call a poor database schema an "abstraction misalignment"; is anything not abstraction misalignment? It feels like we're in an old-folks home where everyone is talking but nobody is conversing. Nobody is agreeing on what "abstraction" means. A better title for this article is "mistakes are expensive", or possibly "bad design is expensive". But of course then nobody reads it, because everyone loves to hate on whatever they've decided "abstraction" means for them.


A CS prof of mine liked to say:

Every problem in computer science can be solved with abstraction other than too much abstraction.


But it's all abstractions. The only issue is who gets to decide. The programmer controls the abstractions they're allowed to code, but everything predefined that they are forced to work with are "specs" that someone else abstracted for them. But even the programmers who worked on the specs are dealing with their own set of "specs" forced upon them by someone else. And these form the abstraction layers.

And if you're working on accounting software, then many of the abstractions you're working with are defined by accounting practices and the IRS. Your job becomes to translate them, and the logic, into software that can automate them, etc.

Whenever you're forced to deal with something, it's expensive. Whenever you're forced to do something, it's expensive, especially when considering the cost of getting it wrong.


Abstraction has a slight cost but doesn't inherently mean it's very expensive. The cost of abstraction is generally worth buying because it allows you to write code in more independent blobs which reduces complexity and thus bugs and potential bugs.

But using abstractions to "solve" a hard problem is dearly expensive. By "solving" I mean pushing up the hard parts one layer at a time until you eventually can't avoid solving it for real. At best, maybe you're lucky and someone else has to solve it, by working through all the layers of abstractions you built on your way to negotiate yourself out of the hard spot. But those abstractions are expensive because they don't reduce complexity nor offer any tangible benefit except keeping their creators in their comfort zones.


"You can't delegate understanding." - Charles Eames

Abstraction means you don't have to think about details. Unfortunately, details matter in software.

As a simple example, how many projects use Hibernate, then run into massive performance problems? Pretty much all of them...because the devs don't bother to learn how their data store works. Then they iterate over 150,000,000 rows with their ORM instead of using, you know, SQL.

It's always a tradeoff, but again, "you can't delegate understanding."


I don't know if this will help (or even resonate) with anyone else, but it's helpful to me to view abstractions as a form of DRY.

All developers have created functions to de-duplicate their code. And all developers have consequently seen how the more code a function de-duplicates, the larger and more cumbersome a function becomes; how many more if statements and safety checks come into play.

Now imagine how complex - how many if statements and safety checks and introspection - have to go in to replacing hand-written SQL with custom constructors with an ORM.


It's unfortunate that you have only ever experienced abstractions which grow in size over time. The best abstractions out there are ones which operate well together to solve a large set of problems in the smallest amount of complexity (local to any given abstraction) as well as the smallest amount of total complexity at any given abstraction layer, as well as the smallest amount of total solution complexity. Now yes, this end goal is an idealized world, when you build on top of crusty old APIs and crusty old software you inevitably end up with crusty abstractions, especially difficult to deal with is the real world. That being said, there are plenty of examples out there where complexity at all three levels of detail that I outlined above has been painstakingly minimized. It usually takes a lot more effort to produce a simple solution to a problem than a complex one. This really requires the "alignment" the article talks about.

Speaking of "alignment", ORM is one of the best examples of complete and total misalignment. The sets-of-tuples model of relational databases and the graph-of-objects model of object oriented systems are so seriously at odds with each other that there isn't a single ORM out there which successfully fully resolves the mismatch without severe abstraction leaks. (No, the fact that you can successfully use an ORM without abstraction leaks at the cost of severe performance degradation doesn't really matter.)

On the other hand, large parts of the design of a project such as Plan 9 are extremely aligned.


You're right, I have never seen an abstraction which hasn't grown with time. I think that's because either an abstraction project either ends, or receives constant feature requests. Kind of a grow-or-die mentality as applied to software development.


It rarely happens in for-profit situations but people are slowly wisening up to the cost of technical debt for longer term projects especially.


For the mathematically minded, this is not about abstractions. This is about technology choices! One more interpretation of an overloaded word that means a very specific thing.


Agreed, they should probably include their definition of 'abstraction' in the post.


I would've thought that everyone who has done any programming at all would be familiar with the term as used in the article—why would this not be the case?


Choosing the right database technology for your application has nothing to do with abstractions as far as I understand the term. Even if you're just going with the dictionary definition, that doesn't makes sense. An ORM is an abstraction, not a database.


Here's an example of a database abstraction choice: the data model.

Do you treat data as a series of normalized row-based data tables with joins optimized for strongly consistent OLTP? [PostgreSQL]

Or as a denormalized row-based key-key-value single table optimized for eventually consistent OLTP? [ScyllaDB]

Or as a denormalized column-based single table optimized for OLAP? [Pinot]

Or as a denormalized key-value optimized for in-memory caching? [Redis]

And so on for document stores, or property graphs, etc.

Each of these design choices then gives way to the abstractions of implementation.


I think that's part of the problem with the use of the word in programming circles: it means different things depending on whose talking.


And no abstraction is perfect, as by definition an abstraction hides some details from the layer beneath. A good abstraction is one that allows you to not look under the hood most of the time. One can be happily writing code in their favorite programming language until you want better performance and started looking into cpu caches, branch prediction, etc. which are "breaks" the nice abstractions provided by the OS and high level programming languages.


Abstractions are at their worst when they cross bound contexts. A customer in a billing system is not the same as a customer in a sales pipeline system. By merging disparate purposes for seemingly technical benefit, we often create complexity where none need exist.

Domain Driven Design helps wall off inappropriate and costly abstractions.


This post makes me wish more research was done on exo-kernels: https://en.wikipedia.org/wiki/Exokernel

The concept makes dealing with some the topics in this post much more logical and easier in my opinion.


I know this is about back-end, but it partly describes my issues with front end - the industry standard abstractions save you very up front development time, but they increase the project complexity massively.


I think front end is possessed with abstraction and DRY. Everything has to be a library or npm module, People keep on writing their on versions of same stuff most of the times because they didn’t liked the name of the function or order of arguments. Complexity in front end is definitely increased by obsession of abstraction while it should’ve reduced it


I think he has a better take on it.

https://m.youtube.com/watch?v=hOrpppzEX14


I think it is too common that you have to work with abstractions that hides functionality in the underlying tech.


Even though many people call abstraction 'the essence of programming', it is rarely fully understood. Maybe no human on the planet really understands it. I've long had an essay in me about this topic that I really need to put to paper soon.

Here is the brief version of that essay:

* Programming is about building theories/models of the problem. Source code largely has no value by itself [1]

* It is very difficult, if not impossible, to transfer these models/theories between humans (see the essay I linked below for details)

* Because of this difficulty, programming is essentially teaching (when writing program code, we're teaching other people of the problem domain, and models that fit that problem domain)

* Another way to phrase this: All programming is building user interfaces. When you're writing, say, a function, you're writing a more abstract UI for the next programmer or yourself

* The tools we have for writing these UIs are terrible - taken the above example of writing a function, what are the tools you have to communicate this idea to the next person? A single string of characters (the function name). In a typed language you get a bit of extra info because of the type information (I believe this to be the main advantage of type systems).

* The big question is: What would a better UI for communicating abstractions look like?

* I don't know the answer to that question but I have a hunch that it has to be bi-directional. If you've ever worked with a GUI library that has a visual editor, you know how awful they are, unless there is also a representation of the same GUI in code. This bi-directional mapping of code and GUI makes it very easy to understand the two different ways of looking at the problem. I think something like this is needed down to the very lowest levels of abstractions.

* Another way to phrase this: Imagine two different scientific models for the same problem. For example, for the model of an atom, its protons and neutrons, there is the more simple Bohr Model, which is completely wrong, given our current knowledge, but still very useful in many modern calculations. But in certain situations, a more accurate model is needed, which takes quantum mechanical effects into account. I see an analogy for programming here: In most cases, a simplified model suffices, but as more performance is needed, a more complex model is required. The question is, how can we easily teach someone the more difficult model, once they've understood the simple one? And the other way around (which is often also not easy, since simple is not the same as easy).

If you have any thoughts on what I've written, any at all, I'd love to hear from you (I'll watch this thread, or find my email in my bio).

[1] https://hiringengineersbook.com/post/autonomy/


> it is rarely fully understood

I beg to differ. Case in point: type classes, the Monoid type class, and Monoids as the mathematical concept; all well understood, proofs of their existence and properties are stated exactly, and when implemented by a compiler behaves exactly as expected when the program is executed. This is precisely what people mean when they say "abstraction is the essence of progamming"...

... when they understand what abstractions are and use a precise definition.

Unfortunately there exist many programmers in the world who do not use precise definitions and don't know how to state mathematical laws or invariant properties of relations. They too use the word "abstraction" and they often mean... whatever it is they mean. It differs from person to person and is often used when they're waving their hands and trying to make a point.

Update: The overwhelmingly vast majority of programmers don't think about integer representations and how arithmetic is implemented these days; some do and that's fine, but the software world continues to ship vastly complex programs and systems without having to care about it and everything still works. That's abstraction at work.


> Case in point: type classes, the Monoid type class, and Monoids as the mathematical concept; [...] when implemented by a compiler behaves exactly as expected when the program is executed.

Not true in general. For some specific instances (such as fixed-size Int under addition or multiplication[0]), it works[1], but in the general case (eg Integer under add or mul, [Foo] under concat) `a <> b` can fail with out-of-memory if a and b are large enough. (This debateably violates closure of `<>` (you could say `undefined`/`error "out of memory"`/etc is a valid element), but indirectly breaks all the other Monoid laws like `(a<>b)<>c==a<>(b<>c)`, since the result is `undefined`(/etc), rather than `True`.)

0: Actually, I'm not sure it's true even then: does Haskell actually guarantee 2s-complement truncation on overflow?

1: Give or take "How many bits does Int truncate to?".


Isn't this why fixed, signed integer types don't have a Monoid instance?

https://hackage.haskell.org/package/base-4.17.0.0/docs/Data-...

For arbitrary sized `Integer` type, which is basically a libgmp arbitrary integer, we also don't have a Monoid.


> Isn't this why fixed, signed integer types don't have a Monoid instance?

No. Integers (signed or unsigned, fixed or arbitrary-precision) don't have Monoid instances since Haskell requires a single class instance per type, and it's ambigous which instance should be the "canonical" one.

The types `Num a => Sum a` and `Num a => Product a` have Monoid instances, but it's not clear whether the "canonical" instance for a given integer type should have `(<>)` defined as `(+)` or `(*)`... or `min`, `max`, `and`, `or`, `xor`, `lcm`, `gcd`, or any of several dozen other monoidal operations.

Conversely, `[a]` doesn't really have any resonable monoidal operation other then `concat`, and most of the existing Monoid instances (most conspicuously Sum and Product above) are even more nothing-else-is-reasonable, if only because of their names.


Ah ok that makes sense, and if you choose to use `Sum Int` you'd be technically implementing the Monoid abstraction for that type. In practice because the `Int` type is a signed, fixed-size integer it would break the laws for all integers.

In that case it would still be a useful abstraction but you would have to be aware that your `Int` will wrap around for large values which kind of makes `Int` a poor choice.

`Integer` would be better but is still somewhat finite because computers.


> In practice because the `Int` type is a signed, fixed-size integer it would break the laws for all integers.

No. `Sum Int` (unlike, pedantically, `Sum Integer` or `[a]`) is a valid (non-leaky-abstraction) monoid. It adds `Int`s, which is a monoidal operation that satisfies `(a<>b)<>c==a<>(b<>c)` for all (2^(3*MACHINE_BIT_WIDTH) distinct triplets (a,b,c) of) Ints, due to 2s-complement truncation on overflow.

> your `Int` will wrap around for large values

Yes, that's what `Int` means.

> `Integer`

will fail to terminate when adding sufficiently large values (in theory with a out of memory error, although in practice operations just get slower and slower until you abort them with ^C or a timeout), a property not shared by addition of `Int`.


The main reason Haskell does not define Monoid instances for numbers is that there are two equally valid instances: with 1 and * or with 0 and +. So, instead, there are two newtypes defined:

* `Sum` whose `mempty` (identity) is `Sum 0` and `<>` (combining operator) is multiplication; and * `Product` whose `mempty` is `Product 1` and `<>` is addition

You have to choose one explicitly. That is, you can't say `2 <> 3` and expect 6 (or is it 5?). Instead you have to say `Product 2 <> Product 3`


> * `Sum` whose [`<>`] is multiplication; and * `Product` whose [...] `<>` is addition

... Uhh?


Parent probably wanted to create two bullet points with the two asterisks.


I was actually questioning the "Sum is multiplication, Product is addition" part, although now that you mention it, the lack of a line break is also a (less significant) problem.


I'm one of those programmers you mentioned. Could you help me? I'm genuinely trying to learn more about this. What is the precise mathematical definition of the term abstraction?

Any links to good resources are appreciated.


> Mathematical abstraction is the process of considering and ma­nipulating operations, rules, methods and concepts divested from their refe­rence to real world phenomena and circumstances, and also deprived from the content connected to particular applications.

https://journals.openedition.org/philosophiascientiae/914?la...


Thanks for the link!


Those abstractions are still leaky, e.g. it's entirely possible that an optimizing compiler could apply correct monoid laws and end up with a program that has vastly different space usage.


Without abstraction it would be difficult to write a proof that your implementation of the monoid laws stay within specified bounds.

It's not "leaky" if the specification doesn't say anything about the space bounds. That's a separate concern from "abstraction" itself.

Update The idea of a monoid is divested of any real-world implementation and that's what makes it useful (and consequently also hard to explain). The important thing is that there are laws to how the operations on monoids compose and relate to one another that must be maintained no matter how they're implemented in an actual computer.


I've had similar thoughts. Extremely condensing things, I'd see lots of value in:

* Better code folding, so you can cram more info into source code but also hide it easily.

* Function signatures with "types" that are arbitrary predicates and optional generators.

* Function signatures with properties e.g. this binary function is commutative over type X.

* Combine those and you can name algebraic structures.

* "Enforce" this all through property based testing.

IMO, this would make code that's much easier to reason about than untyped code while being more approachable and flexible than other typing systems.


Bad title.

The article clearly says : Abstraction *misalignment* is expensive.


no duh. now lets get back to the real world where all the "abstractions" are poor abstractions AND inefficient.


In most cases time and capital are more expensive and abstraction, well-used, is the cheapest technical debt.


Not if ChatGPT creates it.


premature abstraction is a very common anti-pattern


your abstraction is more expensive than mine


Hey talking heads.


I’d argue that abstraction is cheap. You should do it more. As often as possible.

Consider a world where the software industry state-of-the-art is completely nascent and isolated to every owner of a computing system. You buy these huge refrigerator-sized machines and get a bunch of manuals with them but nothing else. You have no operating system. No compiler. Nothing. The expense for this project can only be funded by a large university or government. The timeline to deliver is measured in years.

In todays world? We write programs that generate new programs, that emulate entire classes of machines, and we ship software projects in days, weeks, and months.

It’s so vastly complex that it’s a wonder it works at all let alone so well.

Abstraction is what makes it all work without toppling over when someone, somewhere in the stack, makes a slight change.


This sounds like an application risk-aversion/-seeking that Kahneman highlights in "Thinking, Fast and Slow"

Specifically... each of these individual decisions (wrt to abstraction, in this context) was made to locally optimize (speed, simplicity, etc) some issue or other

But when taken in gestalt, they're overall not only not optimal, but downright bad

Most of those decisions are made by people far to close to a tiny portion of the problem to understand the implications of their choices

Alternatively, it's an application of the blind mice finding an elephant story: yes, it's like a rope, a fan, a tree, a hose, and on and on

But it's none of those, it's an elephant


You’re speaking to impedance mismatch which is one of the classic blunders.

Systems that don’t use the same jargon as the user base end up having substantial bugs, especially ones the authors insist are features. The abstractions are a bad fit for the problem domain and they break things.

I’m in a project that did that massively, at the hands of both architectural astronauts and another common blunder: people who seemed to think the concerns of the user base are beneath them tend to create a fantasy world for themselves where they can pretend they work on something more esoteric than the petty concerns of the people who pay their salaries.

Because of the bullshit (and more importantly, the top-heavy sources of that bullshit) we’ve lost a lot of the better people who could have fixed that situation. Meanwhile we are also trying to expand into a new industry, and I can’t help but wonder if we would have been in trouble even if our abstractions had matched our core competency. I would have needed all of those missing people to make the refactors necessary to do that work, so the agents of chaos and delusion are a little vindicated.

I have a couple of things I want to button up but they are all fast approaching. I don’t expect to be here in six months. I’ve started fantasizing about my exit interview. In which I will probably suggest that my team is too powerful and isolated from the end user and so needs to be disbanded, its members dispersed into groups one and two steps closer to the users. So as to concretize some of the code and confront them with the day to day struggles of internal customers, dealing with the batshit parts of the code.


The title needs work too. People will undoubtedly assume that reducing abstraction will make things cheaper.


Eh, some people might do that, but, really, titles like that beg you to read the article (or at least it's why I read it). The article absolutely does not say anything like "reducing abstraction will make things cheaper".


Sometimes reducing abstractions will make things cheaper.

Some regular traps that I see people run into with abstractions:

- building around scenarios that they can be pretty confident that they will never need to handle.

- building abstractions that are larger and more complicated than the thing they are abstracting (this tends to happen when people build abstractions of abstractions).

- building abstractions before they have a proper understanding of what they're abstracting and what they'll need to encapsulate.

In many of those cases, reducing abstractions (even just temporarily) reduces complexity.

When I'm building purely personal projects, I don't use abstractions to help me deal with directory structure on Windows, because I don't use Windows, and that would be additional complexity for no benefit -- so it's simpler for me to just work with the lower level OS paths.

When I start writing an abstraction I look at the amount of documentation I'm generating, and if I'm generating more documentation than it would take to explain the underlying system, I look to see if there are concepts that I can remove. I worked at a company where our build process became considerably simpler when we stopped using high-level build tools like Gulp/Grunt and switched to writing simple Node scripts, because it was easier to debug what those scripts were doing. We simplified that process even further by occasionally just dipping into Bash scripts.

Working with low-level concepts for a while often also gives you better understanding of what you need to abstract. I've worked with codebases where the abstractions all get built first, and it's not uncommon for those abstractions to be built around tasks that are pretty simple and easy to do with lower-level code, at the same time that the abstractions completely ignore the really difficult tasks that are very annoying to do. And once abstractions get baked in, it was time consuming and difficult and expensive to pull them out and rewrite them.

Going back to the scripts above, our build process got better because we made it simple to begin with -- a set of scripts, rather than a large established pipeline -- and then as we identified pain points, we started abstracting those pain points away. That allowed us to not waste time rewriting abstractions over and over and instead to have targeted small interfaces that helped us with the actual painful parts of building and deployment.

It is surprisingly common for software to be over-abstracted to the point where it is more complicated to deal with than it would be otherwise. I mean, heck, this comes up in web development all the time, it is one of the primary criticisms people have of the JS ecosystem -- that it overcomplicates development. In many cases those complications exist for reasons, they solve real problems that people have had. But also in many cases, someone's individual blog doesn't need any of that, and it would be cheaper and easier for them to build something smaller and simpler. If I can build a site that is one HTML file and one CSS file, and I know that's all I'm going to need, then it's overkill to try and set up a bunch of abstractions on top of that.


Indeed.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: