- What if we ever need to change database server or driver? Proceeds to write a layer to abstract data access
- We might have different login forms one day? LoginFormFactory it is
- This code hits our KV Redis/memcache/NoSQL by calling the driver lib directly? Can't have that, I'll write a CacheStorage or DocumentStorage layer.
Over time, this defensive mentality produces codebases that are so bloated that no one in the team can confortably grasp all the moving parts despite the project not being rocket science.
At that point, devs rather quit or get a substantial raise in order to continue working and ultimately endup leaving to start a new project. But this time, they'll be sure to not make the same mistakes. This time, they'll abstract even more so "when one part of the project becomes bloated, it can be easily rewritten thanks to the abstractions". And the cycle continues.
Overly abstract code could just be a sign of a programmer working through a problem that was not well understood (either by everyone or just themselves). The second or third time they do the same task, the code is bound to be less abstract because they are more comfortable in dealing with interdependencies up front. Suggesting refactoring or rewrites can work well in that case.
For example, hiding POSIX-style select/poll vs Win32 Completion ports behind a common abstraction is something I tried recently, and seems to be HARD. Did not complete. Why? I started with the abstraction without being familiar with what the problem was. All that I knew was that I wanted to experiment with "game engine design".
If I'm writing something for a known environment that's unlikely to change, it seems like a waste of time.
I really like the Dijkstra quote, "The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise."
Usually I even wrap libc and POSIX functionality in my C programs. Advantages include (1) not having to mess with specific types, such as size_t, time_t, DIR*, and so on -- which often don't fit my own model (e.g. dealing with unsigned size_t vs signed integers is painful), and often need small fixes to be portable. (2) compiles a lot faster since I don't need to include 10000s of lines from system header files (3) not wrapping makes huge portions of the project dependent on the library, instead of only the wrapper. (4) You tightly control access to the library, which makes it easier to troubleshoot. You can also make up your own strategy for managing library objects in a central place.
Wrapping things for the sake of it is a recipe for unnecessary verbosity and a hard to learn codebase as things are obfuscated and take longer to trace through.
In C, your point 1 makes me think your code is unsafe, if you're having trouble keeping track of types and signedness. Point 2 is fair enough, but I think a lot of C headers are granular enough that it's irrelevant, especially given how fast everything is these days. Gone are the 8 hour builds of old.
Point 3 is precisely the type of premature interface-isation that's the problem here. If you're never going to change it, then it doesn't matter whether huge portions of the code rely on the library or your wrapper. It's basically the same thing. Only you've put in extra work to wrap something in another layer for no gain. 4 much the same.
For example, I have a FreeType backend in my code, and I have a single file which implements a font interface (open font file, choose font size, draw to texture in the format of my app) by interfacing with Freetype. It returns my own error values, prints warnings with my own logging framework, puts the data in my own datastructures, etc. All things that Freetype could never do since it does not know my project.
I have an OpenGL backend, and I make sure that my geometry and math stuff, UI drawing code, and what not, does not depend on OpenGL, or OpenGL types. (Did that once when I learned OpenGL, and it was terrible). So now I have basically one file which is dependent on OpenGL, and it's there to simply put my internal datastructures on the screen, in a very straightforward way.
Same goes for my windowing backend. I use GLFW currently but I make sure I don't use e.g. GLFW_KEY_A in client code. I define my own enums and own abstractions. It does not come with any cost to simply have my own KEY_A, which means I can swap out the backend relatively easily, and can change my model more easily (for example, make up my own controls abstraction instead of relying on a standard PC 104 keys keyboard, etc).
> In C, your point 1 makes me think your code is unsafe, if you're having trouble keeping track of types and signedness.
No, it's safer precisely because I have a well-defined boundary where values get converted. I can check for overflows there, and not worry about the rest of my code.
There is no practical way to deal with size_t in every little place when normal values come as int.
> Point 2 is fair enough, but I think a lot of C headers are granular enough that it's irrelevant,
You bet. Have you ever tried to access parts of the Win32 API? You're immediately in for (I think it was) 30K lines of code even if you define WIN32_LEAN_AND_MEAN. Otherwise it's maybe more like 60K. It's crazy. Or check out GLEW (which I don't use it anymore). It generates 27000 lines of header boilerplate simply to access the OpenGL API.
Add to that that most headers have a standard #ifdef include guard, which means that while files get ignored the second time, the lexer has to parse the whole thing over and over again.
> If you're never going to change it
How would you know?
> then it doesn't matter whether huge portions of the code rely on the library or your wrapper.
It does matter a lot if there is a semantic mismatch.
You have a target system, and no current plans to change it. You can't predict every change that might happen down, and neither should you try - you'll waste a ton of engineering effort and if there is change, it's almost always change in ways you didn't anticipate.
For example, I would point to 3D graphics APIs as a reasonable case for this. All the ones in use today assume a lot of low-level detail about buffers and pointers and strides and attribute flags, which makes building up test cases challenging: you have to consider dozens to hundreds of options, and setting one wrongly will result in no image or a crash.
So, instead, many folks turn to copy-paste of known working examples, give them a minimal amount of additional configuration, and extend that into the frontend that they habitually use, rather than directly accessing the API in question. The full power is still there - it's not really abstracted - but the workflow has been pushed towards the average case.
Reduction of complexity, like your example, is great. Increasing levels of indirection thoughtlessly is adding to it, IMHO.
I was a C coder for many years, these days I seem to be doing java. There are dependencies in my recent projects that just do what I need, no particular domain translation required. Particularly things like the Apache commons libraries, which provide well formed utilities for common operations. It would be a waste of time and energy to wrap them simply for the sake of having a wrapper.
If this sort of thing isn't what you were driving at, then we've just been miscommunicating. I am 100% for encapsulation of functionality into good, discrete modules which provide sensible interfaces and minimal (but expressive APIs). I just don't like the blind application of "this isn't our code therefore we must provide an interface"
An example would be in games when you submit an achievement or leaderboard score to a third party such as Apple GameCenter, Google Play Game Services, or other systems like Amazon Game Circle, web or custom. These libraries are frequently updated, they change per platform and you will touch them many places in the codebase, so wrapping them in a progression/statistics/achievements wrapper/facade/adapter is smart from the start.
If the abstraction wrapper is tightly coupled, uses third party library types that can change or leaky then yeah, the effort is moot and you end up with more work.
Another case is where you have some sort of messaging system where you want responses/events to be uniform to your system rather than unique to a platform or even a company/product that doesn't match your codebase. An example might be wrapping a game recording library, audio library or something that does not fit well in the codebase or complicates maintenance, even stylistically/standards, or you only need small part of the library like activating it, common types messaging or data level.
Game engines themselves are massive wrappers around many wrapped systems.
Only a Node.js developer could say that. Maybe it's true when the dependency is left-pad or something.
Experienced developers in other languages will acknowledge that (non-trivial) code that is not your own code, is not your own code. You seldom have a very good understanding of the underlying models, and the library has absolutely no understanding of your own model. And it's very hard to make a change in code that is not your own.
Case in point, I just read the title "Linux fsync() issue..." on the HN frontpage. If you have a large project, say a database, I can only hope that you properly abstracted your synchronization model.
You can abstract the platform layer. It's a lot of work, but might be less work then maintaining a code-base for each platform.
If you are working with a very low level API it might be a good idea to abstract it though. You always want to go up one abstraction level, not abstract "sideways". And be aware of the trade-offs like performance, and also being able to understand the code. for example: Developers might know the low level API because it's common knowledge, if you then have made a (leaky) abstraction it might make it much harder to understand.
> You always want to go up one abstraction level
I don't think there is such a thing as "abstraction levels". As it says in the Dijkstra quote, abstractions are semantic models. Interfacing means translating between models.
Translations between abstractions might or might not be fully realizable. Very often there's a mismatch and the translation is not possible perfectly, in which case it's a leaky translation. And that's ok. Often the only way to deal with reality is to ignore some difficult parts of it, since otherwise the project couldn't be completed. For example, RPC is an abstraction for network requests to be modelled as function calls. This ignores the reality of the unreliability, throughput, and latency, of real world networks.
And that's ok. There might be some situations in which the program does not work in the real world. But mathematically speaking, at least the program is correct (in a very obvious way) with regards to the simpler model which, unrealistically, assumes that RPC works just like function calls.
So, RPC is not an abstraction that is somehow built on top of network infrastructure abstractions. It's only typically translated to the semantic world of networks. That's an important difference.
But none of your first few examples seem particularly egregious, if done well, they don't seem like they would sink a project.
I think part of the problem is developers underestimating just how hard it is to write an abstraction layer well. The juniors will say, "yeah I could do that in one sprint" then they start to build on top of it, the problems don't start to show until later, maybe when the dev who wrote it is gone.
I think the correct way to approach a problem like this is to acknowledge the cost of it up front. They are expensive. They don't just have to work. They have to work well and be readable.
When would anyone ever make an inc_pair function? Or a hierarchy of 'animals that lay eggs'? I guess these are supposed to be examples of something, but I'm not sure what. I wouldn't even call them "abstraction". They seem like completely hypothetical examples of how to apply programming language features to non-problems in bad ways.
Without knowing what the designer of these programs is trying to model, it's impossible for me to understand what they're trying to accomplish or why they think applying these language features in this way makes sense at all.
Both of these examples sound like they increase the number of lines of code, for no appreciable benefit, and I recall Yegge's old observation that the main issue with any codebase is simply the bulk of it.
If that were Randall Munroe's next book, I'd buy a copy.
I am mentioning lens because it has a both function:
incBoth = both += 1
incBoth :: (Bitraversal p, MonadState (p a a) m, Num a) => m ()
both += 1
You can band-aid this to an extent with higher-level "integration" tests, but if you'd done things at the right level of abstraction in the first place you would carry less weight around and wouldn't have to maintain a bunch of brittle tests in the first place.
This is obviously all shades of grey, but if you're mocking out things that aren't I/O you're probably doing it wrong.
If you find yourself vehemently disagreeing with this I'd be interested to know if you've ever had to refactor or simplify a codebase with a bunch of overly-abstracted, itty-bitty things that had very tight coupling via mock behaviour to all their tests, and if so, whether that felt pleasant to you or not. If you haven't then you probably haven't seen the considerable longterm maintenance downsides to this kind of approach and I feel sorry for the poor folk who will inherit your codebase.
Also curious is that many codebases I come across that look like this often have terrible copy/paste mock setup across lots of tests, making the issue even worse when you want to change things.
Those sorts of codebases often end up with people wrapping the abstractions in other abstractions because they're sufficiently resistant to change as a result that that seems easier. This obviously makes everything even more resistant to change (especially as the wrapper abstraction usually depends deeply on all the behaviour underneath it, and the mock set-up to test the wrapper ends up as an exercise in mentally mismodelling how the other components actually behave).
That's not necessarily true. Abstractions with a small surface area (exposure to their 'outside world' - e.g. via function signatures) that are very deep (they hide a lot of complexity) make more complex behaviour much easier to manage.
When the surface area is high and the depth is low is when the overhead of the abstraction tends to exceed its use value.
Perhaps, you simply mean a different thing when you say "abstraction".
If you're writing some database code, having high-level fetchMyEntity() calls mixed with connection/resultset/cursor logic is bad news.
If you're writing something that reads from a message queue and stores the message in a database, the place where the two meet shouldn't know much of anything about either the message queue or the database.
Obviously, exceptions exist where these rules need to be broken, but I find they're fairly few and far between.
And this is the real trap when dealing with abstractions - maybe the implementations of two different operations are the same, but if the semantics are different then de-duplicating them creates an unnecessary dependency between the two. The more cross-links in your application structure, the harder it is to do anything without breaking everything.
One of the truly remarkable things about Haskell is that its approach to structural abstractions allows you to break out of this problem (though you end up paying a different cost for it)
E.g., a quote about the mathematician Carl Gauss:
> Gauss' writing style was terse, polished, and devoid of motivation. Abel said, `He is like the fox, who effaces his tracks in the sand with his tail'. Gauss, in defense of his style, said, `no self-respecting architect leaves the scaffolding in place after completing the building'.
A similar issue I have is with people constantly 'refactoring'. I choose to say I'm 'factoring' code instead. If you can't name the factors that you're separating then it's likely you'll change your mind an end up 'refactoring' it. Sometimes you take factored code and factor it further, which I don't have a different name for, just more factoring.
It's difficult to address but imo not be the goal from the get go, to introduce abstractions to a codebase.
There is also a cultural thing around abstractions where inexperienced developers look up to or are fascinated with the complexity of something or someone that brings that to the table.
It's also a common feature of certain so called architects, because they probably feel they need to some advanced techniques.
The thing is though, that when you have worked with developers or architects that advocates for simple abstractions, and it over time proves that is both efficient and cheap, then you start to doubt complexity in total.
Also remember that complexity is often not complex per se, as long as you spend time on breaking that complexity down.
And then you have a better fundamental platform for solving what you need to.
Simple code is fast code. And also easier to change.
From your perspective, it means that there is also a need to establish agreements on the levels, depth and ways abstractions in the code are formed. Indeed, I worked with software where the functions and operations weren't implemented in a messy way per sé, but the many levels of indirection, abstraction (and obscurement) made things just really difficult to read and a real tail-chaser when it came to maintenance.
Those levels can also make it much more difficult to understand the flow and the operations that are happening, because in many languages you pass references to data objects, so data gets changed in many ways.
Nice article, puts me into thinking mode again! :)
Usually, the established abstractions of the domain are an excellent guide. Even if you think they are sub-optimal objectively, they are easier to reason about for experts in that field.
I like the article's language of comtaining the "damage" code can do.
A C-style modularizarion technique I haven't seen used is long functions, with sections' variables' visibility limited by braces.
It's a bit like saying bricks are useless, by limiting the entire argument to a single brick.
Quoting from the blog post:
>It also seems that Go's implicit interfaces were designed to avoid unnecessary abstraction.
Actually, interfaces are often extremely abstract in Go. How much more abstract does it get than io.Reader? It's a thing that you can read bytes from into slices (arrays). The io.Reader abstraction is far simpler than os.File, net.Conn, or even bytes.Buffer (a file object, a network connection, and an in-memory buffer, respectively).
I agree with him that abstractions should be crafted carefully, but not because abstractions are categorically bad. If he thinks abstraction is so bad, why does he write software at all? Software is extremely abstract, even assembly language is an abstraction of what the CPU does. Human language is abstract too: we can talk about trees without having to consider any specific tree.
Abstraction is a fundamental building block of human civilization. Of course, he doesn't actually believe that abstractions are bad. It would be nice if he differentiated between good abstraction and bad abstraction.
On a side note: I take issue with his criticism of mocking. Good abstractions are easily mocked and make the code base easier to understand because you don't have to consider every detail of the application at the same time. On the other hand, mocking concrete objects, rather than abstract roles, definitely complicates things.
 Or "header interfaces" that duplicate the API of a concrete class exactly : https://www.martinfowler.com/bliki/HeaderInterface.html
In other words, to add a cost (the possibility of being dismissed/sneered back) to such dismissals.
I find facile dismissals irritating as well, and lord knows this site gets a lot of them. But the way to push back is with a clear, positive defense of whatever was unfairly dismissed. Venting doesn't help; it only invites more venting.
> to add a cost (the possibility of being dismissed/sneered back) to such dismissals.
Given the fact that this is a conversation among strangers, I would assert that it isn't really that effective to just add costs by making discourse less pleasant.
In general, I think a community is healthier when we treat people 25% better than you expect to be treated, to account for the Fundamental Attribution Error and other misinterpretations.
I guess so. Sometimes I'm just pissed from the easy dismissal, as in "This 5 second basic retort is all you've came up with, and you think you've taken down TFA?".
If someone writes a blog series called “should this be abstracted, what’s the abstraction?” I think we would see some great discussion.
We're unlikely to encounter a problem that is solved with a single funtion, or as one function per line. But the important thing is to look at what the code should achieve. Most importantly, local decisions should be made based on what the code should achieve in a global context. Local syntactic optimizations are less important than the global picture.
I strongly agree with the Golang authors and with experienced C programmers that it's much better to write a few lines more and be more clear and explicit in exchange. Note that there are some additional lines that increase complexity, but I would argue that lines which contribute to clarity do not contribute complexity. In fact, those investments in additional lines usually decrease the number of moving parts.
Syntactic homogeneity is important so one can easily see what one piece of code achieves in the global context. It does not help if we micro-manage and constantly think about the type of for loop or lambda abstraction or error handling mechanism to use, only to shave another line off.
Unfortunately looking at the actual problem is what most people forget. Instead the discussion are about languages (filter, maaap. maaaaap), frameworks, libraries, object orientation without any concern of what these features can do towards reaching a specific goal.
Now that you mention information theory, I want to mention the term "Semantic compression", created by Casey Muratori. I think he has done at least one stream about it, which you should be able to find on YouTube. In general I recommend to follow him. He is one of the most experienced and no-nonsense guys I've found on the internet.
Keep things 'square'. This is a fuzzy concept and I don't really know how to explain it properly, other than that the effort spent on each layer of abstraction should be roughly equivalent.
Your example is a single 1m-line function or 1m one-line functions. In this case you probably want 100 functions of 100 lines each (and yes, refactoring like this you probably save 100-fold in overall LoC so it works out.) And your 100 functions are probably nested in a 10-deep hierarchy where they all do roughly the same amount of cognitive work.