
The Design of Software is a Thing Apart - nancyhua
http://www.pathsensitive.com/2018/01/the-design-of-software-is-thing-apart.html
======
Chaebixi
> Those who speak of “self-documenting code” are missing something big: the
> purpose of documentation is not just to describe how the system works today,
> but also how it will work in the future and across many versions. And so
> it’s equally important what’s not documented.

Documentation also (can) tell you _why_ the code is a certain way. The code
itself can only answer "what" and "how" questions.

The simplest case to show this is a function with two possible
implementations, one simple but buggy in a subtle way and one more complicated
but correct. If you don't explain in documentation (e.g. comments) why you
went the more complicated route, someone might come along and "simplify"
things to incorrectness, and the _best_ case is they'll rediscover what you
already knew in the first place, and fix their own mistake, wasting time in
the process.

Some might claim unit tests will solve this, but I don't think that's true.
All they can tell you is that something is wrong, they can't impart a solid
understanding of _why_ it is wrong. They're just a fail-safe.

~~~
jph
> Some might claim unit tests will solve this

Yes. Tests will solve this. Your point is perfect for tests.

If another experienced coder cannot comprehend from the tests why something is
wrong, then improve the tests. Use any mix of literate programming, semantic
names, domain driven design, test doubles, custom matchers, dependency
injections, and the like.

If you can point to a specific example of your statement, i.e. a complex
method that you feel can be explained in documentation yet not in tests, I'm
happy to take a crack at writing the tests.

~~~
squeaky-clean
Old and somewhat contrived example, but the first thing to pop into my head is
the famous fast inverse square root function.

    
    
        float FastInvSqrt(float x) {
          float xhalf = 0.5f * x;
          int i = *(int*)&x;         // evil floating point bit level hacking
          i = 0x5f3759df - (i >> 1);  // what the fuck?
          x = *(float*)&i;
          x = x*(1.5f-(xhalf*x*x));
          return x;
        }
    

I can't think of a way to write a test that sufficiently explains "gets within
a certain error margin of the correct answer yet is much much faster than the
naive way."

The only way to test an expected input/output pair is to run the input through
that function. If you test that, you're just testing that the function never
changes. What if the magic number changed several times during development, do
you recalculate all the tests?

You could create the tests to be within a certain tolerance of the number.
Well how do you stop a programmer from replacing it with

    
    
        return 1.0/sqrt(x);
    

And then complaining when the game now runs at less than 1 frame per second?

Here's a commented version of the same function from betterexplained.com.

    
    
        float InvSqrt(float x){
            float xhalf = 0.5f * x;
            int i = *(int*)&x;            // store floating-point bits in integer
            i = 0x5f3759df - (i >> 1);    // initial guess for Newton's method
            x = *(float*)&i;              // convert new bits into float
            x = x*(1.5f - xhalf*x*x);     // One round of Newton's method
            return x;
        }
    

It's still very magic looking to me, but now I get vaguely that it's based on
Newton's method and what each line is doing if I needed to modify them.

I actually just found this article [0] where someone is trying to find the
original author of that function, and no one on the Quake 3 team can remember
who wrote it, or why it was slightly different than other versions of the
FastInvSqrt they had written.

> which actually is doing a floating point computation in integer - it took a
> long time to figure out how and why this works, and I can't remember the
> details anymore

This made me chuckle. The person eventually tracked down as closest to having
written the original thing had to rederive how the function works the first
time, and can't remember exactly how it works now.

I think the answer is both tests and documentation. Sometimes you do need
both. Sometimes you don't, but the person after you will.

[0]
[https://www.beyond3d.com/content/articles/8/](https://www.beyond3d.com/content/articles/8/)

~~~
kazagistar
Write a property based test, ie generate a bunch of random inputs, then assert
that all of them are within some (loose) margin of error.

~~~
squeaky-clean
This doesn't satisfy the time constraint though.

    
    
        return 1.0f / sqrt(x)
    

Passes a property based test but now your game doesn't actually run because
it's much too slow of an operation on hardware at that time.

You can also test execution time too, but that's finicky and doesn't help
explain how to fix it if you break that test (if there's no accompanying
documentation).

~~~
tome
But at this point all you're saying is "I can't think of a way of testing
performance".

~~~
reacweb
Performance can be tested in a unit test. You just need to measure the time
needed to compute the function on a given set of numbers, then measure the
time needed to compute 1.0f / sqrt(x) on the same set of numbers. The test
succeed if your function is 10x faster. In future, the test may fail because
sqrt has improved and this trick is no more needed.

------
panic
Peter Naur's "Programming as Theory Building" also addresses this topic of a
"theory" which is built in tandem with a piece of software, in the minds of
the programmers building it, without actually being a part of the software
itself. Definitely worth a read:
[http://pages.cs.wisc.edu/~remzi/Naur.pdf](http://pages.cs.wisc.edu/~remzi/Naur.pdf)

~~~
jacobolus
The biggest problem is when users of software, programmers of software, and
the software code itself have 3 different incompatible theories of how it
works.

Sometimes it gets worse still: you can have different theories according to
(a) scientists doing basic research into physics or human
perception/cognition, (b) computer science researchers inventing publishable
papers/demos, (c) product managers or others making executive product
decisions about what to implement, (d) low-level programmers doing the
implementation, (e) user interface designers, (f) instructors and
documentation authors, (h) marketers, (h) users of the software, and finally
(i) the code itself.

Unless a critical proportion of the people in various stages of the process
have a reasonable cross-disciplinary understanding and effective communication
skills, models tend to diverge and software and its use go to shit.

~~~
taneq
This is why dogfooding is so important - you're updating the programmers'
model to align with the users' model, reducing the total problem space (and
thus the available avenues to get it wrong) by many degrees of freedom.

------
jrochkind1
I would love to see the pendulum swing back around to _good design_ again.

It matters more when designing libraries/frameworks than one-off apps.

Switching to a new framework/platform/language at the point the one you were
on before finally matured enough that it was hard to ignore the need for good
design doesn't actually help. you'll still be back eventually.

~~~
sekou
There have been articles over the last few years that have highlighted the
dangers of idolizing and prioritizing innovation. I too hope for increased
attention to craft and design of software in the future.

------
fooblitzky
The OP doesn't seem to understand TDD:

"So you update the code, a test fails, and you think “'Oh. One of the details
changed.'"

Some of the concerns they raise about writing tests are covered by Uncle Bob
here: [http://blog.cleancoder.com/uncle-
bob/2017/10/03/TestContrava...](http://blog.cleancoder.com/uncle-
bob/2017/10/03/TestContravariance.html) and here:
[http://blog.cleancoder.com/uncle-
bob/2016/03/19/GivingUpOnTD...](http://blog.cleancoder.com/uncle-
bob/2016/03/19/GivingUpOnTDD.html)

~~~
rimliu
Good design is difficult. That's the easy to understand part. What is
difficult for me to understand, why is this skill just ignored. There are lots
of skills that are difficult, but people still persist on learning them. Not
for software design. It-works-somehow-for-now seems to look good enough for
the most. This also results in "OOD is difficult, FP will save us. Oh no, FP
does not really save us, FRP for sure will". Sorry guys, you will need to
break some eggs for omelette.

------
eikenberry
Seems to me there might be ways to program that convey more information. For
example flow-based programming (FBP) seems like it might help and should help
make the flow of the program explicit and obvious. That is, inherent to the
code is a high level overview of what it does.

From my own limited experience it can make explaining a program to someone new
almost trivial. You just use the various flows defined as almost visual guides
to what is happening. I don't want to say FBP is a silver bullet, but I think
it points to the idea that it is possible capture much more of the theory and
design of the program in the code.

------
eksemplar
We’ve increased our productivity by quite a lot over a five year period by
ditching most testing on smaller applications.

Basically our philosophy is this: a small system like a booking system which
gets designed with service-design, and developed by one guy won’t really need
to be altered much before it’s end of life.

We document how it interfaces with our other systems, and the as-is + to-be
parts of the business that it changes, but beyond that we basically build it
to become obsolete.

The reason behind this was actually IoT. We’ve installed sensors in things
like trash cans to tell us when they are full. Roads to tell us when they are
freezing. Pressure wells to tell us where we have a leak (saves us millions
each year btw). And stuff like that.

When we were doing this, our approach was “how do we maintain these things?”.
But the truth is, a municipal trash can has a shorter lifespan than the IoT
censor, so we simply don’t maintain them.

This got us thinking about our small scale software, which is typically web-
apps, because we can’t rightly install/manage 350 different programs on 7000
user PCs. Anyway, when we look at the lifespan of these, they don’t last more
than a few years before their tech and often their entire purpose is obsolete.
They very often only serve a single or maybe two or three purposes, so if they
fail it’s blatantly obvious what went wrong.

So we’ve stopped worrying about things like automatic testing. It certainly
makes sense on systems where “big” and “longevity” are things but it’s also
time consuming.

------
mpweiher
_the information of a program’s design is largely not present in its code_

And that's the problem. We need ways to make those higher level designs
(~architecture) code.

~~~
Jtsummers
This is the problem that I've run into trying to use formal methods.

I love them, I can express some things very concisely and even clearly. But
there's no direct connection to the code and so keeping things synchronized
(like keeping comments synchronized with code) is nigh impossible.

We need the details of these higher level models encoded in the language in a
way that forces us to keep them synced. Type driven development seems like one
possible route for this, and another is integrating the proof languages as is
done with tools like Spark (for Ada).

This will reduce the speed of development, in some ways, but hopefully the
improvement in reliability and the greater ability to communicate _purpose_ of
code along with the code will also improve maintainability and offset the
initial lost time.

And by keeping it optional (or parts of it optional) you can choose (has to be
a concious choice) to take on the technical debt of not including the proofs
or details in your code (like people who choose to leave out various testing
methodologies today).

~~~
hinkley
Gilad Bracha wandered off to work on progressively typed languages after he’d
had enough of trying to fix Java’s type system.

I think if it took something like JSdoc and have it more teeth you could do
something like this in just about any of the dynamically typed languages.

~~~
mpweiher
Well, he also "wandered in" from doing optionally statically typed
languages...see Strongtalk and Newspeak. :-)

------
Chiba-City
This is a lovely article. Software is a possibly a) errant and b)
misinterpreted operational semantics of some other semantic horizons of
contractual or implicit expectations. Knuth's Literate Programming was onto
something. We inhabit a world of word problems and even faulty realizations of
rarer formal specifications. Claims concerning "phenomena in the world" drive
maintenance and enhancement regimens.

~~~
hinkley
Worse still, most of us walk around under the delusion that we know what we
want while others can see it doesn’t make us happy.

How do you get the product you want when you don’t know what you want?

~~~
jiggunjer
Just buy Apple. They know what you want.

~~~
fiddlerwoaroof
You may be joking, but I think the way in which this is true explains Apple’s
success, even though they’ve generally released products that are less
featureful and significantly more expensive than their competition.

------
charlysl
1 point by charlysl 21 minutes ago | edit | delete [-]

Wouldn't it be better to use data abstraction instead of abusing primitive
types?

For instance dates are often abstracted as a Date type instead of directly
manipulating a bare int or long, which can be used internally to encode a
date.

So, age, which isn't an int conceptually (should age^3 be a legal operation on
an age?), could be modelled with an Age type. This, on top of preventing
nonsense operations, also allows automatic invariant checking (age > 0), and
to encapsulate representation (for instance changing it from an int
representing the current age to a date of birth).

------
robotresearcher
return x >= ‘A’;

Would be better than

return x >= ASCII_A;

surely. ASCII_A could be set incorrectly, or have a dumb type, and is more
verbose anyway. By using the character directly, the code speaks its purpose.

~~~
coldtea
> _ASCII_A could be set incorrectly, or have a dumb type, and is more verbose
> anyway. By using the character directly, the code speaks its purpose._

I disagree. ASCII_A speaks it's purpose (we purposefully want an ASCII A
stored here). And one can check the constant's definition, and immediately
tell if it's correct. E.g.

    
    
      const ASCII_A = 'A' // correct
    
      const ASCII_A = 'E' // wrong
    

So:

    
    
      return x >= ASCII_A
    

tell us the intention of the code's author.

Whereas:

    
    
      return x >= ‘A’;
    

only tells us what the code does, which might nor might not be correct (and we
have no way of knowing, without some other documentation).

So, by those two lines:

    
    
      const ASCII_A = 'E';
      (...)
      return x >= ASCII_A;
    

We know what the code is meant to do, AND that it does it wrongly (and thus,
we know what to fix).

These line, on the other hand:

    
    
      return x >= ‘A’;
    

tells us nothing. Should it be 'A'? Should it be something else? We don't
know.

~~~
phkahler
return x >= "A"; // ascii A

Gets the whole message across in one line, as does using 65 with the comment.

~~~
robotresearcher
(Ignoring the typo "A" != 'A')

return x >= 'A';

 _already and only means_ ascii A. Is there a C compiler anywhere where or
likely in future where 'A' in C is NOT ascii A? The comment is redundant if
correct, and could be wrong after an edit, so it has no value.

~~~
coldtea
> _return x >= 'A'; already and only means ascii A._

See, here's where you are wrong.

    
    
      ASCII_A = "A"
    
      alphas = ["Α", "А", "Ꭺ", "ᗅ", "ꓮ", "Ａ", "𐊠", "A", "𝐀", "𝖠", "𝙰", "𝚨", "𝝖"]
    
      for c in alphas:
          print c == ASCII_A
    

Output?

    
    
      False
      False
      False
      False
      False
      False
      False
      True
      False
      False
      False
      False
      False
    

Several of the numerous possible utf-8 alphas. Those are not A in different
fonts -- they are different unicode characters that look like A. And depending
on your font they could look absolutely the same as plain ascii a (of which
only one towards the middle of the list is). And depending on your locale and
keyboard language settings, one of them could be as easy to click as the
regular english A in ASCII.

~~~
robotresearcher
I deliberately used the character literal ‘A’ and not any of your UTF8
strings. I think you are mistaken to confuse a character with your strings. Is
this wrong?

~~~
coldtea
You can have a unicode character literal -- and depending on the language
there's no distinction between character and string (at the type level), a
character is just a string of length 1.

~~~
robotresearcher
I was assuming C, where there is a difference.

int main() { printf( "%d %d\n", 'A', "A" ); return 0; }

produces: 65 197730221

since the value of string "A" is its base address.

------
moolcool
I wish websites wouldn't change the browsers default scrolling behavior

------
bringtheaction
I misread the title as "the design of software is a thing of the past". I
welcome the actual title and content though.

------
erpellan
I lost interest after 'that is a fatal mistake'.

Fatal mistake? Really? An unrecoverable failure?

So, none of the software I've written in the last decade worked, despite all
evidence to the contrary?

Right.

