

The "Black Box" Disease - andreyf
http://andreyf.tumblr.com/post/321371513/the-black-box-disease

======
alextgordon
Abstractions are a necessity for portability. You can't talk about _"little
heads reading data off the hard disk"_ if your program is expected to work on
SSDs too. Or what about the "quantum-holodisks" that arrive in 2020?

The vague understanding of the software and hardware stack is a good thing -
it could change at any time.

~~~
cabalamat
Abstractions are not only necessary for portability, they are necessary for
understanding increasingly complicated software systems.

Say I write a program in Python. Python is implemented in C, but I don't have
to care about C. The C source of the Python implementation is compiled to x86
machine code, but I don't need to know how that works either. Machine code is
miplemented using logic gates, which in turn are implemented using
electronics. Again, stuff I don't need to know about.

I write my program using a text editor which uses KDE, and below that, X
Windows. These are both complex software systems, and again I DON'T NEED TO
KNOW ABOUT THEM. Underneath X is the graphics card. This presumably uses some
sort of memory-mapped display, but I don't know the details.

Back in the days of CP/M systems with 64K of RAM, one could understand a
computer system in its entirety. It simply isn't possible now.

~~~
andreyf
Right, and that's exactly the immaturity I'm talking about - you're writing in
a language designed by A interpreted by a program written by B in a language
designed by C compiled to an architecture defined by D, where A B C D are sets
of people who have with little or no interest in what the others are doing.
Between KDE, X Windows, and the graphics card, how much code complexity and
hardware is necessary to do what Ivan Sutherland did nearly half a century
ago?

 _Understanding a computer system in its entirety simply isn't possible now._

Precisely! But there's absolutely no reason why it can't be, aside from the
inelegant, weak, and opaque abstractions we're using.

~~~
lmkg
But what advantage is there to be gained? On the one hand you're removing
accessibility to an underlying layer, but on the other hand you're also
removing the need to deal with it, except for a small fraction of edge cases.

If I'm writing a document in MS Word, my concern is what the document says and
how it looks. Will knowing the specifics of hardware drivers for the hard disk
or monitor help me do this in any way?[1]

This is the same argument that C won against assembly back in the day, and the
same argument that C is losing against yet-higher-level languages in the past
decade. The utility you can squeeze out of low-level facilities, past one or
two layers of abstraction, is small, especially compared with the level of
effort required.

You view it as "siloing." Most of us view it as specialization. Programmers
get to focus on the novel aspects of their problem space, and the pieces like
hard disc drivers that are universal get placed in the hands of specialists
whose general solutions are still better than a million people making their
own solutions.

[1] To be perfectly honest, I do structure write my docs with the compression
scheme of .docx files in mind, albeit in very vague general terms. However,
this is more vanity than anything else, a word document being a kilobyte
bigger or smaller has a vanishingly small chance of having a nonzero impact on
anyone's life.

~~~
coliveira
C is not an example, because you can still access assembly if you want (using
conditional compilation, integrated assembly), just like every OS does.

Word doesn't give you low level access, and that is why it is so despised by
technical people. On the other hand, see how latex is so useful exactly
because it gives you the option to manipulate the lower level structure of
your document.

In my point of view there is nothing wrong with specialization, as long as you
have the option to go low level if you want/need.

------
gvb
The "transparent" vs. "opaque" abstraction distinction is excellent, but I
submit it is an example, an instance, of a more important philosophy. The
philosophy can be summarized in one word...

 _Discoverable._

Opaque abstractions are not discoverable: you cannot look inside to discover
what makes the abstraction work. If the abstraction works, life is good.
<strike>If</strike> When it breaks, you are forced to limp along with a broken
mental model of the abstraction. With opaque abstractions, every time it
breaks, you get "buyer's remorse."

Transparent abstractions are discoverable: if you _need_ to (or want to), you
can open the box and look inside. As long as your mental model matches the
actual functioning of the abstraction, you don't _need_ to open the box. More
importantly, when your mental model vs. the abstraction breaks down, you _can_
open the box and either fix your mental model (likely) or fix or enhance the
box.

Now to expand "discoverable", think of Apple's products (I think they are the
best company at implementing _discoverable_ products). Why do they not provide
a 2 inch thick printed manual with their products? Because they have a
comprehensive Help file[1]? No, it is because their products are
_discoverable._

You don't have to know how to use every feature of the iPhone in order to get
started, you just turn it on and make a phone call. All the useful features a
new user needs are obvious and intuitive. Need to browse the web? OK, there is
an icon that looks like it will browse the web. Hey, look, it _worked._ Need
to do ______? Poke around, ask a friend, ask Google and you discover new and
better ways of doing _____.

As importantly, the _advanced_ features lie quietly in wait. They do not
distract the new user, but they are _discoverable_ when the new user becomes
more sophisticated.

Not needing a manual is just a side benefit of being _discoverable._ The bonus
for the user _and Apple_ is that the user's delight in the product does not
end after the turn it on for the first time. There is no "buyers remorse."
Instead, they are _discovering_ new features for months, sometimes years,
which results in a long term stream of surprise _and delight_ in the product.

[1] Windows and most Windows software attempts to make their products
discoverable by (a) showing all possible options at once, making their product
incomprehensible, and (b) providing an incomprehensible help file that, if
printed, would be 2" thick.

------
DougWebb
I had a completely different take on this then everyone else commenting; our
software can be treated as an opaque black box too, and that can be
detrimental.

My company recently started a new code review process: to save the reviewers
time, changes are first reviewed by a non-programmer UX designer to verify
that the UI behaves as the UX designer intended. The UX designer tests the
application and either rejects it or approves it. If rejected, it goes back to
the developer for more work, and if accepted it goes to the technical lead for
code review. The theory is that the code reviewer doesn't have to confirm
proper functionality, because the UX designer has already done that. (Our tech
leads tend to be overloaded.)

The problem is that the application code is a black box for the UX designer.
She can give it input and see if the output looks right, but there's no way
she can determine if there is any particular input that will produce wrong
output. At best she can expend a large amount of effort giving random input,
but unless she can cover the entire possible universe of input (generally not
possible) she can't guarantee correctness.

The code reviewer doesn't have this problem; the application is not a black
box, so he can examine the code to determine the common cases and edge cases,
and find unhandled cases. A code reviewer can determine which input will
likely break the application by inspection rather than by trial and error.
That's much more efficient, and we have a lot more confidence that correctness
can be guaranteed.

Unfortunately, the code reviewers aren't doing that... they're not looking at
correct functionality at all, because the UX designer supposedly is doing that
for them. The process does not recognize and accommodate the opaque black box
problem, and as a result the Quality Assurance team (who have the same problem
but they do a lot more random testing) are finding bugs we completely missed.

Interesting note: our QA team have been dealing with this problem for a long
time, and they've been quietly self-training as programmers in order to be
able to read and understand our software changes. This has made them a very
effective QA team, especially when they include suggested diffs in their bug
reports. They're a pleasure to work with.

~~~
andreyf
If the tech leads are over loaded, peer code review may be a good solution.

~~~
DougWebb
The trouble is a lack of peers. For many years our codebase was completely
maintained by no more than three people, and often just one or two. Two of
those people are gone now (I'm the third) and we've been trying to increase
the number of developers who know the codebase well. The tech leads are the
in-house developers who have a range of experience on this codebase, from two
years to a couple of months. The bulk of the developers are at an offshore
subcontractor who has a very high turnover rate, our last project started with
a team that mostly had no experience with the codebase, other than their tech
lead.

So yeah, two developers can review each others code, but if neither of them
have much experience with the codebase they can't maintain quality very well.
(We get a lot of cut-and-paste and magic-incantation coding from the
inexperienced developers.) Slowly but surely we're improving things, and while
I'm still the only developer with broad knowledge of the entire codebase, the
others have become experienced with different parts of it, so I'm not the
bottleneck I used to be.

One nice thing about my company: while we've got a big-corporate bias for
process-at-any-cost, we're pretty flexible about trying out different
variations on the process with each project in order to improve our approach.
It's slow going, much slower than it would be in a startup, but it's not the
brain-dead process quagmire a lot of companies have either.

------
scotty79
This is one of the reasons that open source is great thing. If you encounter
bug you may go as deep as you need to. No inaccessible black boxes in your
way.

When I encountered bug using MS software in my projects I could go only as far
as their bug tracking site that had my bug an it was more than year old. This
happened to me twice.

On the other hand when I found bug in prototype.js I could fix it myself. They
even accepted my patch and included it in future release.

~~~
andreyf
It's not realistically possible, though: many software components have
millions of lines of code - code bases that would take years to really
understand.

~~~
scotty79
In practice annoying bugs can be caused by few easily understandable lines
that no one bothered to look into so far.

------
msluyter
One pertinent example of this: Tom Kyte (Oracle guru) argues against treating
Oracle as a black box. If you try to ignore Oracle's internals, you'll tend to
end up doing things in extremely suboptimal ways. Say, for example, by not
using bind variables in your queries. The more I use Oracle the more I agree
with Tom Kyte.

~~~
ntoshev
Oracle should keep statistics on the queries it gets and infer which of the
parameters should be treated as bound parameters in a reusable query plan. I
wonder why it doesn't. It would still need to parse queries, but I believe
making the query plan is the expensive part. Also parsing could be done by the
client machine's driver.

~~~
msluyter
So, you're saying given queries like:

select * from t where t.col = 'foo'

select * from t where t.col = 'bar'

select * from t where t.col = 'baz'

Oracle should infer that foo|bar|baz should be parameterized? Interesting
idea, though this seems like matter of matching parse trees, and I believe
that's a relatively expensive operation, roughly polynomial time, iirc.

~~~
ntoshev
You don't have to infer all semantically equivalent queries, comparing queries
as linear streams of tokens should do.

------
rpledge
In theory I kind of agree with this, but reality is that most programmers
never "master" the trade. Abstractions that help people get there job done by
simplifying the task are useful. Remember that not everyone is writing the
Linux kernel.

I think the real value here is that to master a system one needs to look
beyond the abstractions at some point.

~~~
andreyf
So my hypothesis is that while this is true now, it won't be once the industry
matures.

~~~
Quarrelsome
I really hope you're wrong.

The premise that after many years of the industry developing it is still
unable to create effective enough abstractions that don't "leak". That's a
tragic state of affairs especially when there is a still a huge amount of
content to learn to achieve mastery.

I personally expect a maturity of the industry to arise when we make
abstractions effective enough that those with less mastery are able to
construct and run their own programs with less problems.... Wait, did I just
state that something like VB6 helps mature the industry? Shoot me, please ;)

~~~
andreyf
_I expect a maturity of the industry to arise when we make [effective
abstractions]._

Right, that's a great way of saying what I meant. To expound: effective
abstractions must be "powerful" in the sense that they let you express ideas
in broad strokes, and also "clear" in the sense that they wouldn't have
unexpected properties (leak). Some of the macros I've seen in PG's Arc are
good examples: they're intuitive and quick to look up. The problem with macros
is that they aren't very polymorphic - you can't set default variables or pass
in custom key/values as you can in Python's functions, for example.

~~~
gruseom
_The problem with macros is that they aren't very polymorphic - you can't set
default variables or pass in custom key/values as you can in Python's
functions, for example._

You can't? I think I do this all the time (unless I'm misunderstanding).

Good post, by the way. I have been thinking about similar things recently and
I agree with you. The idea that we'll build up a tower of abstractions that
lets us sit on top and do powerful things forever is in some ways a gigantic
mistake (q.v. "The Princess and the Pea"). You can do things with smaller,
simpler white-box environments that you can't do with large, complicated
black-box ones. Something like this is the principle behind what Alan Kay's
group is working on, too. I wonder if the industry will ever absorb this.
Based on what we know of software history so far, it would probably take some
kind of killer app built on top of the radically new thing.

------
10ren
In Administrative Law, "powers" are often granted to government ministers. A
power is the ability to do something (e.g. reject an application). They are
often cast in this form:

    
    
       The minister may reject the application if:
         (i). <specific situation 1>
         (ii). <specific situation 2>
         (iii). for some other reason
    

It's quite bizarre format, carefully constraining the choice, then opening it
up utterly, like Douglas Adams' editorial expenses joke.

I mention it here, because the first two reasons are like an abstraction; the
last reason is like a transparent abstraction. It enables you to get things
done, even if the drafter of the legislation hadn't considered the particular
situation you find yourself in. It's an admission of pragmatic humility.

However, I must add a counterpoint. I really love the abstractions that do
work supremely well - like arithmetic, field access and so on. We probably
think of them as fundamentals rather than abstractions; the greatest testament
an abstraction can receive.

~~~
ggchappell
> ... Douglas Adams' editorial expenses joke.

Haven't heard that one. And, alas, Google is no help. Could you offer any
pointers (or just state the joke)?

~~~
10ren
Can't find it. Will look again later. It's basically the same as above, but
for when hitchhiker writers' can charge things to their expense accounts (or
maybe for granting a writeup, and journalistic integrity?), and the last line
is "You really want to."

I think it's followed by "Ford tried to avoid that one, because it always
involved giving the editors a cut."

Ah! Now I have those extra words, google helps:

 _The Hitch Hiker's Guide to the Galaxy is a powerful organ. Indeed, its
influence is so prodigious that strict rules have had to be drawn up by its
editorial staff to prevent its misuse. So none of its field researchers are
allowed to accept any kind of services, discounts or preferential treatment of
any kind in return for editorial favours unless:

a) they have made a bona fide attempt to pay for a service in the normal way;

b) their lives would be otherwise in danger;

c) they really want to.

Since invoking the third rule always involved giving the editor a cut, Ford
always preferred to muck about with the first two._

<http://flag.blackened.net/dinsdale/dna/book4.html> (Chapter 5) - I fixed a
typo when I copied it

------
morphir
<EDIT: I toned down the harshness.>

Abstractions are key to every good design. Let me try to explain more in
detail:

Abstractions can be built in two ways, I top-down II bottoms-up

A top-down approach is per SICP described as 'wishful thinking'. We simply
assume we have the function. And then we drill down, and we let George (the
programmer) worry about the implementation. Taking the UML approach is not
necessary - traditional black box abstractions will suffice.

A bottoms-up approach or Domain Specific Language (DSL) is not very well
supported among the typical boiler-plate languages today, small languages like
scheme is perfect for DSL, because of its smallness.

You design top-down - and you implement bottoms up.

Naming functions is a art of its own and is key to good software engineering.

Personally I'm sick of programmers who don't understand the art of
abstractions - as much as I'm sick designers who don't know what the limit of
a computer, or the capabilities of their language.

A note on UML: Uml got one thing right, and that was the 'Use Case'. A proper
use case forms the foundation of a good top-down design.

~~~
wingo
I don't think Andre said that abstractions are useless.

I agree with most of your other points, but it's difficult to get past the
confrontational tone. Be more charitable :)

~~~
morphir
Forgive me, I see that I have miss-interpreted the article. (I skimmed it).
Reading it again, I see that I do agree.

------
ramidarigaz
Anyone have examples of 'good' abstraction vs. 'bad' abstraction?

Edit: Missed ntoshev's comment.

~~~
andreyf
I had examples by cut them in fear of starting tribal conflicts, and because
brand-name examples detract from the logical point. The ones I cut are:

\- JVM is a "bad" one, as most Java programmers aren't intended to ever think
about JVM internals. Languages developed on top of Parrot and LLVM are "good"
ones, as they spend considerable effort making their internals accessible.

\- ORMs on top of *SQL are "bad" if the user doesn't understand their
implementation. Flat files and MongoDB are "good", assuming the user does (as
I imagine most do).

\- TCP/IP used as a byte stream is "bad", with an understanding of packet
ordering/repetition/correctness is "good".

Rails' templates are transparent abstractions to HTML compared to some others,
but a lot of the other "magic" is an example of some pretty opaque stuff - the
ORM being an example, and also abstraction from the network layer.

~~~
ntoshev
"Transparent" and "opaque" seem better terms than "good" and "bad".

ORMs usually let you fall back to SQL if you need to (ActiveRecord does), so I
would classify them as transparent. Relational databases themselves are opaque
though, they basically present you with a subset of relational algebra and how
its operations will be carried out depends on a myriad of parameters in a non-
trivial way. And you cannot possibly push some little modification in the
inner loop.

Another good example is Larry Page asking how to set a custom user-agent for
what became the googlebot in the then-current-JDK (and having no way to do
it):

[http://groups.google.com/group/comp.lang.java/browse_thread/...](http://groups.google.com/group/comp.lang.java/browse_thread/thread/6923c024ed392c85/88fa10845061c8ba)

~~~
andreyf
Re: ORM/SQL, compare Django's ORM/MySQL to how PG implemented storage in
news.YC - I've devoted several orders of magnitude more time working with the
former, but I wouldn't be able to describe what happens in-hardware nearly as
well as I could on the latter, which I read once (in a language I never
seriously used) nearly a year ago.

~~~
ntoshev
I don't have experience with your specific example, but it seems you are
complaining against the opacity of relational databases, not the ORM itself. I
agree, relational databases are very opaque.

~~~
nwatson
RDBMSs aren't that opaque ...

A relational database exists to store rows of data in tables and to relate
them, making CRUD operations possible in this context. At that level it's very
abstract. But it also starts to expose some implementation details as
appropriate, and that's why some DBAs make a very good living (I'm not a DBA
myself).

Transactions are required once there can be more than one DB client (well,
even before that when considering atomicity/durability). And most DBs expose
different transaction isolation levels because the various options have
different speed vs semantic reliability implications (e.g. dirty reads). These
make sense only in light of lower-level concepts that leak through from the
notion that data is stored on disk, buffered in memory, handled by threads,
protected by locks, etc.

And then there are details of indexing, clustering, distributing table data
among available disks, where to store the transaction log, how to
secure/encrypt data on disk, disaster recovery, support for high availability,
etc. These all are exposed leaks that relate to what the little heads on the
disk and CPUs and network all do in light of your desired performance and
reliability.

Some DBs also abstract out the back end, where you can plug in different low-
level storage engines or even expose your own data structures as virtual
tables (e.g. flat files or network data streams). You're able to write your
own back end if you want.

And I've heard of at least one effort to rewrite RDBMSs to run well on and
take advantage of the nature of SSDs vs. hard drives -- I'll be curious as to
how that turns out. I also believe the nature of such a system is
fundamentally different enough to make adapting existing RDBMSs to SSDs very
difficult.

I'd say then that many RDBMSs have done a very good job exposing the low
detail levels to the level that's appropriate.

~~~
ntoshev
Well, in my opinion nothing is completely opaque or transparent, but to me
ORMs (given their ability to fall back to arbitrary SQL, which their main task
is to abstract away) are way more transparent than relational databases (that
constrain you to their relational model).

Indeed, RDBMSes give you a myriad parameters to tune, but if you think about
it almost all of them affect the behavior of the system in a global way and
almost none of them is really an escape hatch that lets you do things in your
non-relational non-ACID way (e.g. you can't choose not to persist _some_ of
your writes to a given table immediately, although you can choose to keep the
entire table in a memory-based storage engine in mysql).

SSDs are not that different from hdds I believe. The nature of hard disks is
that you should minimize random access. With SSDs you can do all random access
you want _on reads_ , but you still better write continuous blocks. There are
storage systems like Cassandra that were developed for hdds but conform well
to this SSD-optimal pattern, it would be interesting to see how they perform
on SSDs.

------
dpatru
Ultimately everything is an abstraction, even the _"little heads reading off
the hard disk."_ People solve problems by framing them in convenient
abstractions, often switching fluidly between abstractions as it's useful.
Natural language reflects this through metaphor and analogy. Andrey is arguing
that programmers should take advantage of this technique as much as they can.
So they shouldn't use abstractions that prevent the use of other abstractions.

As an example, consider the paragraph above and some of the abstractions used:

 _framing_ \- comes from picture framing; bounding a view

 _fluidly_ \- meaning "like a fluid"

 _reflects_ \- reals truth like a mirror

If I were barred from using abstractions from art, physics, and everyday
objects, it would have been harder for me to explain a computer programming
concept.

------
ntoshev
I believe transparent abstractions are a big part of what made Rails
successful: e.g. you have a primitive for an input box in a form, but you can
pass through arbitrary attributes to the generated HTML.

~~~
sheriff
I think the form helpers in Rails are handy shortcuts, but I don't know if I'd
call them abstractions. Abstractions are tools for generalizing, but Rails
helpers only produce HTML output. Only if you could re-use the same helpers to
produce a variety of different types of markup would I call them an
abstraction (and that would probably make it necessary to make the abstraction
less transparent).

A better example of a true abstraction in Rails is ActiveRecord, which I think
you'll agree is bit less transparent.

~~~
epochwolf
ActiveRecord allows you to query the database directly using execute() or use
a query to populate ActiveRecord objects with find_by_sql().

execute() returns the result in whatever form the underlying database driver
uses. I think that's about as transparent an abstraction you could ask for.
You can completely ignore ActiveRecord's abstractions.

find_by_sql() returns an array of ActiveRecord objects.

~~~
zaphar
thats not transparent that's discardable. Transparent would be knowing what
happens when you don't use execute() or find_by_sql(). Just because a library
allows you to avoid using it in the way it was intended to be used the most
doesn't make it transparent. That just means it realizes that sometimes you
have to discard the libraries features and do something else.

~~~
sheriff
exactly. it's nice that AR makes it easy to go down a layer, but then you're
no longer abstracting away the database.

edit: to be fair, I was less than precise in my earlier post when I said that
ActiveRecord wasn't totally transparent. I was specifically referring to AR as
an abstraction that can be used identically across various different
databases.

------
barrkel
I found this amusing:

> _[...] smartest people I know prefer transparent abstractions [... ]In the
> context of programming systems, this notion instantiates as the distinction
> between a language which makes it easy to look underneath high-level
> features and one which hides implementation details away. This is a general
> explanation of the attractiveness of small-kernel languages_

The irony here is that small-kernel languages, taking Lisp as the ultimate
example, hide _more_ of the machine from the programmer than does a language
like C. Because small-kernel languages have abstractions written in the
language itself, it means they are not necessarily the best abstractions for
the machine. When the language has built-in abstractions which correspond to
the machine, I would argue it is _then_ that you are really seeing through
your abstraction.

This is not to say that I think C is therefore a better language than Lisp.
There are more important considerations than the degree of abstraction, such
as how expressive your language is for creating abstractions. If it's not
expressive enough, like C, that can cause worse problems than not being able
to see through to the hardware. On the other hand, if your language is very
malleable, you may end up with a different problem, where software drawn
together from lots of different libraries all have different abstractions and
end up less interoperable than they should be.

------
dkarl
The smartest people gravitate, by taste and economics, to hard problems that
aren't already solved by existing abstractions. For the vast majority of
software project, it's sufficient to rearrange existing abstractions in ways
consistent with their intended use. When you hit a legitimately hard problem,
it's may offer an exciting chance to gain a real technological advantage on
the competition, but you have to make damn sure that the competition won't
blow by you because they found a way to _not_ solve the hard problem.

------
mattjung
Imagine the world without an abstraction called IP Layer.

~~~
access_denied
It would have another abstraction layer for a-hole behaviour instead.

------
stcredzero
Abstractions are _tools_. Very powerful tools. And like other such tools, they
can be misused and you can get yourself into deep &^%&^%#.

Very smart people can think about problems at varying levels of details. They
know how to use abstractions properly. Sometimes, this means not using them in
certain situations!

------
Confusion
_we should avoid abstractions which permanately hide details, and instead,
seek out those which allow us to ignore the details when convenience allows,
but promptly think through the abstraction when necessary._

Why do you suppose such abstractions exist and that the same abstraction is of
as much value to someone else? Given the sheer amount of details on different
levels, is it even feasible for someone to be aware of all relevant details?
If not, no abstractions that allow you to 'see through them' can possibly
exist, because there is no underlying detail for you too see.

