How programmers make sure that their software is correct

pydry · on Jan 6, 2022

I find it interesting that while logic bugs (e.g. off by one errors, switched if conditions, etc) make up less than 1/3 of bugs I encounter day to day, it seems to comprise of about 80% of all discussion around testing.

Bugs caused by misunderstood or unexpected interactions between subsystems and specification bugs seem to predominate these days IME but discussion about sophisticated ways of dealing with these things seems fairly scant.

pierrebai · on Jan 6, 2022

The roots of the problem you described are many. One is that there are very good unit test libraries out there, which encourages writing unit tests. That sounds good, and is a good starting point, but it often become the be-all end-all of testing.

In my experience working on many very large software, I have encountered one that was head and shoulder above the others in quality. This is not just a hunch. There were metrics tracked, thing like MTBF, and percent of sessions with clean exits, etc and this software was ridiculously better than the others.

The difference lied entirely on the fact that the better software had almost no unit test, but had 21K+ integration tests. I've worked on other software that also had more integration tests than average, and they too were of better quality.

The problem is making integration testing easy needs to be designed in from the start and requires not just one simple trick but multiple inter-locked design decisions that accumulate into making integration testing easy. In this particular case, one could write a new test just by using the software and a new test would be generated automatically. That's how you achieve writing 21k integration tests.

There are many unit test framework, but I've yet to see a generic integration framework. The reason is that integration tests are run with the whole software, so the whole software needs to be designed around them.

rixed · on Jan 6, 2022

> In this particular case, one could write a new test just by using the software and a new test would be generated automatically

I'm interested in the topic. Could you talk more about it, or point to some resources describing in more details how this was achieved/what decisions had to be taken?

usefulcat · on Jan 7, 2022

I’ve written a great deal of tests over the years and this sounds very interesting to me, yet it’s not obvious to me from your comment what this would look like. I’d like it very much if you could elaborate further.

eternityforest · on Jan 7, 2022

Integration tests are hard to do right, but easy to hack together. You might have to do the whole "Start with a clean DB, fake push a button, check for this record" thing, but at least you'll have some tests.

They might be a maintenance nightmare but... they'll probably be better than no tests

smugglerFlynn · on Jan 6, 2022

I guess it depends on a system you build and its maturity - in my area (automation for finance and revenue management) 80% of bugs are related to business logic and misinterpretation of user requirements.

marginalia_nu · on Jan 6, 2022

Errors are highly domain specific. When I worked in finance, I would have very much agreed with your description. But then, the code I wrote was fairly trivial, the challenge was communicating with the users and sometimes getting the users to understand what they were asking for wasn't what they wanted.

I, the same developer, spent a good hour recently trying to iron out the bugs in a piece of code that concatenated two integers bit-wise and calculated the offset point. That is, given two integers whose bits are

  00000000,00000000,00000000,0PPPPPPP
  00000000,000000QQ,QQQQQQQQ,QQQQQQQQ

The code should produce the shortest an array of bytes that looked like

  ZZZZZZZZ,0000000P,PPPPPPQQ,QQQQQQQQ,QQQQQQQQ

Where Z in this case would be 18, the offset of P's bits. This is finicky in a way I've never encountered while I worked in finance. It's not exorbitantly difficult, but there are just a lot of potential logic errors to make.

The code, ultimately, looked like this (it's Java hence the extra awkward bit twiddling)

        int pw = bitWidth(p);
        int qw = bitWidth(q);
        int pwr = (pw%8);

        int totalSize = (pw + qw)/8 + (((pw+qw)%8)>0?1:0);

        byte[] data = new byte[totalSize + 1];
        int b = data.length - 1;
        while (p > 0) {
            data[b]=(byte)(p&0xFF);
            p = p >>> 8;
            if (p != 0) {
                b--;
            }
        }

        if (pwr != 0) {
            while (q > 0) {
                data[b--] |= ((byte) ((q << (pwr)) & 0xFF));
                q = q >>> (8-pwr);
                data[b] |= (byte) (q & 0xFF);
                q = q >>> pwr;
            }
        }
        else {
            while (q > 0) {
                data[--b] |= (byte)(q&0xFF);
                q = q >>> 8;
            }
        }
        data[0] = (byte) pw;

        return data;

It's like 40 lines of code. It shouldn't be that hard. Yet it is. Bit-twiddling is error prone and hard to reason about and, dunno, maybe pwr isn't a great variable name, but the code really doesn't get particularly expressive even if you change it to pWidthRemainder or whatever. It's still basically a bunch of cthulhu fhtagn as far as I'm concerned.

wruza · on Jan 6, 2022

This code would be less bullshit if C language (or C+ or any successor) enabled bit arrays and slices, just like byte arrays work. Bit-fiddling grows from the fact that there is no “addressable bit sequence” type in a language. I wonder how many bugs could be erased by simply allowing the following in such algorithms:

  bit data[length];
  int width = 6;
  bit value[width] = random(64);
  for (i=0; i < lim; i += width) {
    int j = endof(i, width);
    data[i:j] = value;
  }

or something like

  uint8_t c = 0xff;
  c[0..3] = 1;
  c[5..7] = 1;
  // c == 0x31

dgunay · on Jan 6, 2022

C++ has `std::bitset` for when the length is known ahead of time [0].

Otherwise, `std::vector<bool>` _may_ be specialized to occupy one bit per element [1].

[0]: https://en.cppreference.com/w/cpp/utility/bitset [1]: https://en.cppreference.com/w/cpp/container/vector_bool

wruza · on Jan 7, 2022

Both (and boost’s version too) can only address/assign one bit at a time, so it introduces the mirrored problem: how do you splice an n-bit integer into it without fiddling? Of course one can always abstract ops from my snippet away into some library, like setbits(ptr, offset, len, value), that’d be still valid, no need for fancy overloads even. But when batteries aren’t included, nobody cares enough to buy them separately. We know that from the situation with C strings.

jrockway · on Jan 6, 2022

I recently wrote a microbenchmark where I convinced myself that doing operations to a [64]bool was equally fast as doing the same (bitwise) operations on an int64.

Const-me · on Jan 6, 2022

I think it’s the language. Haven’t really tested but in C# that would be something like that, 8 lines of code instead of 40:

    public static byte[] func( uint p, uint q )
    {
        int pw = 32 - BitOperations.LeadingZeroCount( p );
        int qw = 32 - BitOperations.LeadingZeroCount( q );

        Span<byte> buff = stackalloc byte[ 9 ];
        ulong val = (ulong)q | (ulong)p << qw;
        BinaryPrimitives.WriteUInt64BigEndian( buff.Slice( 1 ), val );

        int totalBytes = ( pw + qw + 7 ) / 8;
        buff[ 8 - totalBytes ] = (byte)qw;
        return buff.Slice( 8 - totalBytes ).ToArray();
    }

In C++ about the same, albeit you gonna need either C++/20 language version, or some non-standard compiler specific intrinsics for bit scan and byte swap.

marginalia_nu · on Jan 6, 2022

Yeah, Java is almost certainly not the optimal language for low level programming. It's like C++ with oven mitts.

Someone · on Jan 6, 2022

I think I would use BigInteger for that. You can create them from and convert them to arrays of bytes, count bits, shift them left and right, and add them, resizing storage where needed.

https://docs.oracle.com/javase/7/docs/api/java/math/BigInteg...

CraigJPerry · on Jan 6, 2022

I don’t know if it could help in your case (java) but there’s a machine instruction for count leading zeros on x86 platform - LZCNT https://en.m.wikipedia.org/wiki/X86_Bit_manipulation_instruc... - i found out about this randomly years ago when i was doing some interfacing with a microcontroller from python and my solution was horrifically slow.

To exploit this i had to use cython from python to define a c function which used a gcc compiler intrinsic to allow me to write the asm instruction inline. It turned out about 7 lines of code so it wasn’t as bad as it sounds.

In your case, you’d need to use JNI i suppose so it might not be any less code but it should at least be a lot faster if that matters.

marginalia_nu · on Jan 6, 2022

Java has Integer.numberOfLeadingZeros(), which I suspect exposes LZCNT where available.

Const-me · on Jan 6, 2022

Yeah, and where not available Java probably uses BSR instruction which does the same thing but introduced in i386 (1985): https://www.felixcloutier.com/x86/bsr

hiptobecubic · on Jan 6, 2022

What on Earth required this? Is this part of some home grown compression scheme?

marginalia_nu · on Jan 6, 2022

It is indeed, it's part of the compression scheme for the keyword lexicon for my search engine. It's a domain-specific compression that uses the fact that the keyword dictionary is a dictionary of keywords to compress N-grams into dense blobs using this function. Overall it achieves about 70% compression.

The lexicon supports several billion entries entirely in memory, which requires a lot these eldritch coding tricks (since Java certainly wasn't built for that many objects, GC and header overhead extremely prohibitive so objects are entirely out of the question).

Most of the data is kept in contiguous memory mapped byte arrays.

twic · on Jan 6, 2022

Doesn't this code actually put q in front of p? That is:

  ZZZZZZZZ,0000000Q,QQQQQQQQ,QQQQQQQQ,QPPPPPPP

With Z still being the width of P?

marginalia_nu · on Jan 6, 2022

Yeah, that's just me describing the requirements from memory :P

twic · on Jan 6, 2022

Also, it seems to fail if either p or q is 32 bits long - you get zeroes instead of the bits of q.

Here's my attempt, which is shorter, and i think quite a bit clearer:

    int pw = bitWidth(p);
    int qw = bitWidth(q);
    int qpw = qw + pw;

    long qp = Integer.toUnsignedLong(q) << pw | Integer.toUnsignedLong(p);

    int qpwBytes = ((qpw - 1) / Byte.SIZE) + 1;

    byte[] bytes = new byte[qpwBytes + 1];
    bytes[0] = (byte) pw;
    for (int i = 1; i < bytes.length; i++) {
        bytes[i] = (byte) (qp >> (qpwBytes - i) * Byte.SIZE & 0xff);
    }

    return bytes;

This did take me quite a bit of fiddling to get right - but it wasn't the bit-twiddling, it was the index arithmetic for copying the bytes into the array!

marginalia_nu · on Jan 6, 2022

Yeah, the p and q are guaranteed to be positive, should be added. But that is indeed a more elegant solution.

Supermancho · on Jan 6, 2022

That's at least 5 functions of processing munged into one.

marginalia_nu · on Jan 6, 2022

You're welcome to demonstrate that I'm wrong, but in my experience, Uncle Bobbing this type of code doesn't make it much clearer unless you introduce so many abstractions you see significant penalties.

This is hot code with an invocation count in the billions, so among the requirements is that it shouldn't allocate objects while at the same time being thread safe. That's a tricky combo.

Supermancho · on Jan 6, 2022

Regardless of the language, the function is too complex.

Every while could be it's own function and now you have a bunch of simple unit tests. Unit tests are literally the double accounting for the logic, which is the value.

Any unit tests for that code is going to be horrendous to get decent coverage...and this is how it's easy to know it's too complex. Functions, especially if they are small, can be considered free, given the JVM inlines them into branches anyway....but I understand this is a hot path. If you want to avoid re-allocation, Java gets into your way. You could split up the function and use a singletonClass for holding the array, without having to pass the array by value and still being able to reference the singletonClass...blah blah blah.

> Uncle Bobbing this type of code doesn't make it much clearer

I get it and agree, while still recognizing the problem. Not you not the code not me. Java.

It's really unfortunate that Java has damaged programming practices so subtly and completely by being unnecessarily byzantine AND pushing horrible solutions like Spring.

marginalia_nu · on Jan 7, 2022

I don't subscribe to the notion that every branch of every function needs to be tested. That just seems like anxious waste of time, not even considering the gargantuan pile of additional work it creates in maintenance of the tests.

Cthulhu_ · on Jan 6, 2022

That's because these are knowable errors; you can't talk about a bug if you don't know it might occur.

Testing is not a panacea, by design you can only test for what you know and predict. In addition to testing, your setup needs to be resilient enough that unpredictable bugs can be solved quickly and won't lead to data loss.

pydry · on Jan 6, 2022

I don't think we do this because integration and specification bugs are inherently "unknowable". I think it's more cultural than technical.

I've also noticed a parallel phenomenon where teams double down on unit tests for branch conditions, coverage %, etc. when the bugs that are really hurting them predominantly cant even really be detected by unit tests.

faizshah · on Jan 6, 2022

I’m with you, we spend most of our time testing for bugs that we are more comfortable with but they all have some underlying assumption about how we integrate with outside systems. Defensive programming has often taken a backseat to delivering features and testing has more so become a rubber stamp for regressions rather than a test of resilience. A lot of modern software, especially web apps are very brittle, when something stops working the way it’s supposed to the whole app goes into an undefined failure mode instead of giving a user partial functionality.

piyh · on Jan 6, 2022

I inherited a distributed app that causes me a lot of off hours work.

My goal build pipeline brings up the app in docker-compose, then randomly kills services while running integration tests. So many failures are because people don't consider the happy path in a distributed/highly modular system.

thaumasiotes · on Jan 6, 2022

> Testing is not a panacea, by design you can only test for what you know and predict.

Fuzz testing is still a type of testing.

AnimalMuppet · on Jan 6, 2022

Fuzz testing is based on us knowing that certain kinds of bugs can be triggered by certain kinds of input, even if we don't know the exact details.

thaumasiotes · on Jan 6, 2022

No it isn't. Fuzz testing is based on us knowing that bugs probably exist, and no more than that. The entire concept is that you provide random input and see whether bugs occur. You're not looking for particular kinds of bugs.

AnimalMuppet · on Jan 6, 2022

We know that programs sometimes expect data with certain relationships between different parts. For example, one part may be a length field, and the program expects that some other part of the data will have that length, and may fail to handle it correctly if the data does not conform to that assumption. Fuzzing is a... well, a slightly-better-than-nothing way of trying to activate that kind of bug.

Could fuzzing activate other kinds of bugs? Sure. But it's not magic. It's mostly working on known kinds of bugs.

thaumasiotes · on Jan 6, 2022

You have some strange ideas about what fuzzing is.

AnimalMuppet · on Jan 7, 2022

If you know more, you could provide some clarity, rather than just insults.

ThrowawayR2 · on Jan 6, 2022

Fuzz testing requires a test oracle to tell you whether the output is good or bad. Otherwise, all you're checking for is crashes which is unlikely to yield much.

gavinhoward · on Jan 7, 2022

You're partially right.

If you put asserts everywhere in your code, to test all assumptions the code has, then fuzzing is an excellent way to see whenever one of those assumptions is violated because those violations are turned into crashes.

In my experience, fuzzing with extensive asserts is one of the best way to suss out bugs. It gets even better if you can take all of the paths the fuzzer found and run them through something like ASan or Valgrind.

You can read more about that at [1].

[1]: https://git.yzena.com/gavin/bc/src/branch/master/manuals/dev...

lumost · on Jan 6, 2022

This sounds like the problem that despite all of the effort at preventing simple programming errors, 1/3rd of all the bugs you encounter are "simple" programming errors.

Programming errors are of special interest to programmers, as we tend to judge each other a bit harshly when these errors emerge as the problem is not on the business side or other teams making requirements.

lmilcin · on Jan 6, 2022

I am both accountant (actually educated) and developer. I will not rant (today) on how I disagree with unit testing and there are better ways to get the result.

I just wanted to point out double bookkeeping analogy is bad.

Double bookkeeping is there to make certain types of errors very visible.

Double bookkeeping comes from idea that things do not just show up or disappear, but they change state -- value appearing on one account must have come from another. And so double bookkeeping is basically saying -- we want to have a system where it is immediately visible or even impossible to write something down that does not preserve this invariant.

It has nothing to do with whether the operation is accounted for correctly. You can still mess up accounts very easily.

Double bookkeeping does not mean double checking. You just select two accounts and put the same value on opposite sides of those accounts. This is a single operation. Everything about it can be wrong -- the value can be wrong, the accounts can be wrong and nothing is going to catch that error.

Actually, using invariants is one technique I use to ensure my code works correctly. Has nothing to do with unit tests. I just write code, whenever it is possible, in a way that makes it impossible to put it in a wrong state.

Supermancho · on Jan 8, 2022

Regardless of the relative value to ascribe to unit testing, the confidence value is obvious from the intentional and functional double checking, which can be done piecemeal. There is still a problem with added side effects, which unit tests dont pick up on. This will probably be addressed with fn signatures...which doesnt do much but alert for changes. Eventually unit testing can start to approach formalism with monitored functions, wherein side effects must be asserted for a unit test to succeed.

codingdave · on Jan 6, 2022

Yes, tests are good.

At the same time, my measuring stick is always, "Is the customer happy?" Software will never be perfect, so a zero bug rate is not a reasonable goal. Happy customers is quite a reasonable goal, and when they are not, their concerns direct where you need more quality.

jeffreygoesto · on Jan 6, 2022

Tell that to NASA or your local nuclear powerstation provider.

toast0 · on Jan 6, 2022

There's always a tradeoff between the cost of finding a defect before shipping and the cost of finding it after shipping.

NASA and nuclear power stations have a rather high cost to remediate some software defects, so it makes sense to work much harder to find them earlier. Same thing with anything life safety critical. You can't patch a bug and uncrash your space probe, or unmeltdown your powerplant, or unkill your patient. But a lot of other stuff is ok; yeah, it's not great if music players don't work on Jan 1 of years after a leap year, but you'll have four years for the product to decline into irrelevancy before that happens again (or to get patches out, if Zune had market fit)

codingdave · on Jan 6, 2022

It still holds true - their standards to reach the level of "happy" will clearly be more stringent than others. So will hospitals or any customer whose work can make a life or death difference. But they still care about the result of the product, and they set the standard you need to hit.

jeffreygoesto · on Jan 6, 2022

You are right. Thank you for being kind and constructive even though I was snarky and did not fully get that you in fact did not exclude these cases.

buscoquadnary · on Jan 6, 2022

It's funny you say that because the next thing I pictured was Homer Simpson and realized that thousands of brilliant engineers from all walks of life may eventually have all their effort wasted because someone left their station for donut day.

mhh__ · on Jan 6, 2022

Good engineering means building systems that can survive Homer Simpson.

IshKebab · on Jan 6, 2022

Do you really think NASA software is bug free?

Supermancho · on Jan 6, 2022

That was not implied. The issue was bug tolerance being measured by cutomer happiness, which is an ephemeral metric and silly concept in many industries.

IshKebab · on Jan 7, 2022

NASA's bug tolerance is clearly measured by customer happiness too. Their customers are very unhappy if rockets blow up all the time (especially with people on board), therefore they have a very low bug tolerance.

Supermancho · on Jan 8, 2022

The idea that NASAs goals and limitations is equated with happiness is ridiculously reductive.

> Do you really think NASA software is bug free?

Ignoring the fact you are arguing in bad faith is not compelling

GL with whatever.

gitgrump · on Jan 6, 2022

Software will never be perfect, but most people aren't actually arguing for perfect. I've seen complicated web apps shit the bed with single-digit requests per second, and those developers also said it doesn't need to be perfect. Forget perfect, how about "doesn't immediately fall over under load"?

cassianoleal · on Jan 6, 2022

As far as I'm aware, unless your software is Hello World simple, the only way to make sure it's correct is to not write it in the first place.

After that point, everything is a compromise.

mellavora · on Jan 6, 2022

Well, my software is always correct.

Sometimes the spec is a bit wrong, like forgetting to be explicit in precisely how the system should crash on a certain unexpected input.

ChrisMarshallNY · on Jan 6, 2022

Great answer!

ChrisMarshallNY · on Jan 6, 2022

I write UI and communication software.

It is pretty much impossible to write meaningful tests that cover everything.

This is especially true, when you mix complex, asynchronous server interactions, with random-access UI.

Today, I am fixing a couple of bugs in an app that interacts with three different servers (all asynchronously, but some semi-synchronously. The timing diagram is a bit ... intense), and also allows the user to do things like switch contexts, while some part of the server interaction is still unresolved.

I write a bit about my approach to testing, here: https://littlegreenviper.com/miscellany/testing-harness-vs-u...

monkeyjoe · on Jan 6, 2022

I started reading through your site a bit. Great content.

Wanted to let you know that the "Here is the Doxygen-generated documentation for the BADGER layer" link is broken on this page: https://littlegreenviper.com/miscellany/forensic-design-docu...

ChrisMarshallNY · on Jan 6, 2022

Hey, thanks!

I'll look at that.

commandlinefan · on Jan 6, 2022

You can try, though - and demonstrate that you did. I've seen bugs that would have been trivially caught if the software had been run even one time - so it was clear that the software had never been run even one time.

gitgrump · on Jan 6, 2022

I've had "senior engineers" and engineering directors submit PRs that just flat out don't work. Like, I ran a minimum working example and it just... didn't work. I swear to God, some people are actually professionally negligent at this point.

ohwellhere · on Jan 6, 2022

This is my favorite. Doesn't even compile, or crashes immediately, and yet... it's somehow in a PR.

klyrs · on Jan 6, 2022

I've done that. Spend days testing a patch, getting everything just right. Then, as you're doing final touchup, you introduce a syntax error with a wrong keystroke and fail to notice before pushing to the PR branch.

But, of course, this is why we use CI.

unionpivo · on Jan 6, 2022

> ... Hello World simple

End even then only if you stick to English strig hello world.

I have had hello worlds fail, in other languages :)

benttoothpaste · on Jan 6, 2022

Hello world definitely can fail even in English. It's just that nobody checks to make sure the print operation succeeded.

junon · on Jan 6, 2022

Formalized verification would say otherwise.

pydry · on Jan 6, 2022

At its worst, formalized verification is an elaborate and overcomplicated process for converting coding bugs into specification bugs.

junon · on Jan 6, 2022

This is a huge copout and isn't at all correct. It dismisses a bunch of insanely intelligent peoples' life works.

_clhx · on Jan 6, 2022

Can't you use this same argument when being critical about any technology, or really anything complex at all?

It is some kind of logical fallacy - just because really smart people worked on something and dedicated their lives to it doesn't mean it is good or correct or that it works.

junon · on Jan 6, 2022

The point was more the parent comment was dismissing the work clearly without understanding any of it. They employed a sound byte to disclaim something. You're welcome to go read the prose on the subject. Parent commenter clearly hasn't.

pydry · on Jan 6, 2022

It's a wry observation that I made when I was forced to learn it by those very same researchers who hyped it up as the future of software development 20 years ago.

It was sold as the "rolls royce of testing" which I think overstated its capabilities and effectiveness.

junon · on Jan 6, 2022

Okay but things are entirely different than 20 years ago.

adwn · on Jan 6, 2022

The problem with formal verification is that in practice, many problems can't even be formally specified. Try formally specifying the requirement "the GUI should work as expected" – it's hard enough even to test this, and impossible to formally specify.

Of the remaining problems, the complete, formal specification would be larger, more complex, and more bug-ridden than the implementation. "Hey adwn, can you add a button to the GUI so the user can export the current dataset to a CSV file?" Maybe 20-50 lines of code, but a full specification would be a multi-year project – and it would be almost certainly wrong.

AnimalMuppet · on Jan 6, 2022

> The problem with formal verification is that in practice, many problems can't even be formally specified.

And even if they can, where does that formal spec come from? Sooner or later, it comes from some informal understanding of what's supposed to happen. Well, how do you check that the formal spec is correct? Because if you can't do that, all formal verification can do is tell you that your program correctly implemented the wrong thing.

And I don't think you can do that. First, I don't think you can go from informal to formal by formal means. Second, even if you can, you still can't verify the informal spec. So formal verification can only go so far.

I can recall several things that were bugs in the spec. Log4j is the most recent example. Heartbleed I think was a specification bug. Back in the day, a lot of computer viruses were helped on their way by bad ideas in the specification of Microsoft Outlook.

mhh__ · on Jan 6, 2022

Not everyone has a spare 1000 programmer-years to write their script in Coq though.

Model checking can be pretty productive relatively speaking, however.

junon · on Jan 6, 2022

Yeah this is unfortunately true. The developer experience aspect needs a lot of work here. I don't disagree. But the assertion that you can't write bug free code is still false, regardless of the amount of effort it takes.

_clhx · on Jan 6, 2022

Lots of my bugs are because of unexpected data. I imagine these systems cannot account for that?

jeffreygoesto · on Jan 6, 2022

That can be improved if you practice to "switch hats" and try to break your own code in all ways you can think of. That is something worth doing when you review other people's PRs as well. Expect all possible inputs before they hit you. Kind of a human fuzzer.

_clhx · on Jan 6, 2022

Well, yes this is good advice no matter what, but it doesn't get rid of the bugs!

junon · on Jan 6, 2022

Of course they can.

_clhx · on Jan 6, 2022

wruza · on Jan 6, 2022

Among other, they demand/receive better requirements. Underspecified and underconstrained systems leave much space for variations of conditions “they may find themselves working in”. I put that in quotes because software is not sentient, and it’s really a programmer who encounters that enormous space at the implementation (or ignores it for some reason). When that happens, they often can’t ask anyone what to do, because nobody except them could understand the problem deep in the structure of a program/process. It leads to defensive approaches which may or may not trigger in the wild, producing cryptic messages, because it was hard to formulate it in the first place. Unrestricted underspecifity puts roots deep into the code, when they could be cut much earlier. Programmer’s clients and employers are rarely aware of that, and it’s a programmer’s responsibility to specify the details as deep as client is able to grasp.

It’s another (imo most important) level of correctness, unrelated to rigid math-like correctness of a result of programming of completely coherent requirements, which almost none of us ever receive.

peheje · on Jan 6, 2022

Very well explained. Looking back I remember the projects where you had one of those unicorn colleagues/customers that truly could tie business and technical complexities together. It's often those projects that end up the best, because you actually CAN ask some about "uh I can see that this weird state could appear in the code, is this something we should work around in a special way or is the state not representable?"

zokier · on Jan 6, 2022

I find odd that there was no discussion about figuring out edge cases and testing specifically for those instead of random values. Especially for functions with large domains, having few hand-picked test values could give much more confidence than thousands of random values.

Additionally with generated test values figuring out the validation logic becomes complex, and in complexity lies potential for bugs. If your testing code more complex than the code being tested, do you really gain that much confidence about the correctness? In comparison with hard-coded test values you can also hard-code the expected results, which can be gotten from some reference or just manually checked for correctness.

Of course ideally you'd have both random fuzzing and more manually defined tests, but that is setting quite high bar.

Quekid5 · on Jan 6, 2022

That's par for the course if you use an actual framework for Property-based Testing. All the ones I've tried certainly do mix in the usual suspects (-1, 0, 1, MAX_INT, empty list, etc.) into the randomly sampled data.

The author of this piece probably chose to make the post framework-agnostic by rolling their own random generation, but left that nuance out.

nathan_compton · on Jan 6, 2022

"Our most important goal in writing software is that it be correct."

Because I'm a particular kind of idiot, it took me about 10 years of working in and around software engineering to understand that this is just not true. In many, many cases the most important goal for software is for it to take up some space in the market (either the external market or the internal corporate reputation market). In order for software to meet this goal, it must more or less function, but it doesn't need to be _correct_. It can crash, make mistakes, lose data, whatever.

In many situations this goal can be understood as "gathering capital." You write software to impress investors or to attract users or to convince other people in the corporate hierarchy that you are a going concern. Thus the valuable part of the software is not its correctness but its _surface_. How shiny is the software? Does it impress and/or please users? Does it impress investors?

If you've read Guy Debord you could have predicted this is how software would work. Software is spectacle.

vanusa · on Jan 6, 2022

How shiny is the software? Does it impress and/or please users? Does it impress investors?

Yup. And in many cases (2) doesn't matter at all - just heaping portions of (1)+(3).

You know the kind of "software" I'm talking about -- the kind that exists (and does all kinds of things) on pitch decks, or in conversations with gullible industry press types. But in no other sense.

dspillett · on Jan 6, 2022

If we all wrote perfect code we'd be putting our QA departments out of work. Do we want that on our consciences?

Shikadi · on Jan 6, 2022

You have a QA department? Nice, I'm jealous

elihu · on Jan 6, 2022

I suppose there's a kind of hierarchy:

Can I get paid to work on this software?

Will customers buy this software?

Is this software fit for some purpose?

Is the software correct according to what the customer expects?

Writing correct software is just a straightforward way of meeting customer expectations, which is a way to ensure that the software is fit for some purpose, which is a way to convince customers to buy the software, which is a way to ensure that you'll continue to be employed working on the software. It's not the only way to get paid, though.

(Sometimes software is developed by end users, in which case you can skip over the "will customers buy this?" question.)

liketochill · on Jan 6, 2022

I hope that my life or preservation of the environment does not depend on the software you are working on being correct.

In fields where that is the case, methods for ensuring correctness are important.

hiptobecubic · on Jan 6, 2022

Yes but I think we would say that incorrect software in those circumstances "doesn't function."

Most software is "usably wrong," which is good enough to build a career on.

AnimalMuppet · on Jan 6, 2022

Good enough because it's good enough (most of the time) for the user to do what they need to do. That's all that's needed. Not perfection.

Even more: Perfection takes a lot of time. Perfection therefore means less software reaching the user. So, is the imperfect software more useful to the user than no software at all? In most cases, yes, even with all the problems the software causes. (Because if it were otherwise, wouldn't the users stop using the software?)

travisjungroth · on Jan 6, 2022

You seem to be largely agreeing with the person you’re replying to. No need to disparage them.

TFA made a universal claim. OP replied “not true, here’s where it doesn’t hold.” Bringing up cases where it does hold doesn’t change anything.

thewarrior · on Jan 6, 2022

It already does.

pelasaco · on Jan 9, 2022

I really liked it, and I will probably use it as documentation for newcomers in my team. However there is one point that I don't really agree:

"it can be difficult to add tests to a large application that has been developed entirely without testing. And that is why testing should be first on your mind when building serious software."

I've being doing it all my life. Join a new project, without any test, and start to add test code, while getting used with the code base. Actually, I would say that "adding tests" for a code base is the easiest and safest way to get used with the code base itself.

bondarchuk · on Jan 6, 2022

.. they don't!

smasher164 · on Jan 6, 2022

I'm not sure why the example used for formal verification is Z3, and not proof assistants like Coq or Agda. SMT solvers have made verification a lot more accessible, but Coq is strictly more expressive than Z3.

ebingdom · on Jan 6, 2022

Coq requires thinking like a functional programmer, and most programmers are resistant to that paradigm for some reason.

davesailer · on Jan 6, 2022

Right now I'm reading "PSP, A Self-Improvement Process for Software Engineers" by Watts Humphrey (2005): https://resources.sei.cmu.edu/library/asset-view.cfm?assetID...

PSP = "Personal Software Process", the step before TSP, "Team Software Process".

Personal software process (PSP): https://en.wikipedia.org/wiki/Personal_software_process

Team software process (TSP): https://en.wikipedia.org/wiki/Team_software_process

It's there for anyone who is interested. Right now I'm skimming it to see what I can learn. Can be quite involved but can reduce errors and improve quality by orders of magnitude. (Proven.)

Given that, Humphrey repeatedly emphasizes that communication and understanding about what needs to be built are the most important parts, and that trying to test quality into larger systems can literally take years (of full-time work, by teams of dozens of developers).

Since I'm retired and reviewing all the ways that I wasted my life, then for me it's only an intellectual exercise, but this could help someone still in the thick of it. Lots of food for thought.

awildfivreld · on Jan 6, 2022

I have recently realized that I am not progressing my skills in the way I wanted, partly because I cannot seem to get off the ground.

I read through the first chapter of PSP now, and this is exactly what I need! Thank you so much for sharing this.

rdtsc · on Jan 6, 2022

> There are generally two main categories of tests. There are unit tests [and] Integration tests

Later on there is a mention of fuzzing. However, there is something better called property tests. Where the on failure there is automatic "shrinking" of input complexity to find a minimally failing test scenario.

For Erlang and Elixir I recommend https://propertesting.com/ book and the PropEr test framework https://github.com/proper-testing/proper.

As an example, here is the use of an advanced feature called targeted property based testing: https://proper-testing.github.io/tutorials/PropEr_testing_wi... used to test a labyrinth.

EDIT: as spockz mentioned below, the original idea was developed by QuviQ as QuickCheck http://www.quviq.com/products/erlang-quickcheck/. It's a commercial offering but there is a QuickCheck mini offered as a free download as well.

mananaysiempre · on Jan 6, 2022

My understanding was that property tests and fuzzing are orthogonal in roughly the same way unit and integration tests are: you do property test to ensure well-delimited pieces of your code satisfy well-defined constraints, and you fuzz to ensure the whole thing just doesn’t go completely nuts when faced with something unusual. But I could be wrong because I never could figure out how to set up fuzzing for some sort of moderately-involved realistic thing (a CSV parser? something like that).

rdtsc · on Jan 6, 2022

I think property based testing could be used for integration tests as well. There are essentially two major steps to use it:

1) Define generators to produce input. These could start as random with some constraints. Say, "positive integers between 10 and 100" and then those can be combined in various ways with other generators like "positive integers between 10 and 100 in a tuple with a utf8 encoded binary between lengths 0 and 256" and so on.

2) Define properties which model your system. Given the inputs from generators, calling your application APIs should keep certain properties true. It should crash in only certain ways, or return only HTTP 200 responses, etc. These could be applied at any level from the simplest pure functions up to an HTTP API.

These steps don't have to be taking in the same order. I end up starting with 2 going back to 1, then 2 again etc.

The difference from basic fuzzing is that once you find a failure, it has a consistent and automatic way to backtrack and try to simplify the input down to a minimum which still causes the failure. So, if a giant 1Gb worth of JSON fails the fuzz test, it can be hard to see why it failed. With property testing, however, it might be simplified down a simple case with one utf8 encoded string of length 10 which has some specific emoji or something like that.

spockz · on Jan 6, 2022

The original is quickcheck for erlang which also has an Haskell flavour. It has inspired scalacheck for scala.

I test pure logic with property based testing and simple integration testing to ensure all the steps are aligned properly.

rdtsc · on Jan 6, 2022

Good point. I edited my comment to mention QuviQ and their QuickCheck product.

yccs27 · on Jan 6, 2022

Those are separate classifications, right? A unit test can be property-based and an integration test can be accomplished theough fuzzing, and versa.

rdtsc · on Jan 6, 2022

It's almost an extra dimension, yeah. I was just highlighting that there is something that is a strict improvement over fuzzing and it's not just academic but available for practical use today.

icsa · on Jan 6, 2022

jeff-davis · on Jan 6, 2022

The article doesn't mention the additional problems with stateful software, which account for a lot of the worst problems in practice.

Consider an application with persistent state. Version 1 has a bug that introduces some slight inconsistency in the state file. Version 7 reads the state file, assumes that the state file is consistent, and ends up producing crazy results. Version 7 is "correct" but still fails.

One answer is to always validate the state file, but that may be impractical due to size or complexity.

A better answer is to use a database that offers declarative constraints that help prevent inconsistencies. This is such a good solution that something written in PHP could be very robust in practice if it uses a good database, whereas something written in haskell that uses a database without good constraints might fail miserably.

zozbot234 · on Jan 6, 2022

You don't need a full database, you just need well-specified serialization/deserialization. Typically this includes explicit versioning mechanisms, so the app can be aware of what version it is dealing with and convert the format accordingly.

jeff-davis · on Jan 6, 2022

ser/de only solves parsing problems and not much else.

What about using duplicate user IDs?

Or maybe many objects in the system need both the employee ID and the name (and the employee ID determines the name, obviously), and the employee changes their name, but it only updates some objects and not all?

There are lots of ways data can be subtly inconsistent and a database is a big help preventing it.

bgro · on Jan 6, 2022

I think a lot of the common memes around programmers like these is due to the wrong-think gatekeeping during the interview process that everybody is copying from FAANG.

I have 5-10 years programming experience and find an unlimited amount of bugs in other developer's work of any and all skill levels.

On occasion, something like a billing calculation will have a function doing the opposite of what they intend to do but somehow nobody has caught it in 5 years. I've also seen ancient security reports that just have a chunk of code that doesn't work like it sounds it should, causing 25% of results to be missed. These are just a couple of actual examples.

I focus on generally not screwing up my work like that as well as finding these problems in my team's code. It's mind blowing to me this has absolutely 0 value though. I and others like me are gatekept out of a lot of work since we process things differently. Developers can't seem to imagine a world where somebody thinks differently than them and memorizing leetcode is just as easy for everybody as it is them. "If you can't talk me through what you're doing while writing your memorized perfectly compilable code on a whiteboard or google doc, I guess you're either just lazy and not really invested, or a bad developer. Probably both."

The only thing that technically matters to get into the dev roles is memorizing leetcode problems in order to pass the interview so that A) You have a job at all (even small companies are doing these now)and B) You can get your actual salary corrected every couple years.

If it's not memorizing leetcode, it's how many (inefficient) lines of code did you contribute? (Thankfully this is finally starting to be phased out as a measurement AND bragging point.)

The people who make it through all this then go on to store their critical world-infrastructure Exchange version number in a 32 bit integer because "It worked and compiled when I tested it, so it's probably good enough." Nobody dares test it due to the toxic hostility back, or because a senior dev wrote it, or "I ran the code and it compiled for me, so it's good."

There's just so much talking down to others and gatekeeping about how leetcode solvers are superior and that it's the only possible starting block for a "real dev". I can't believe the frequency of simple yet major mistakes these people go on to actually do after they spend so much time bragging about it.

continuational · on Jan 6, 2022

Sorely lacking from this article is the topic of static type systems and immutable data structures.

One guarantees the absence of certain classes of bugs; the other reduce the number of moving parts in a system.

RoddaWallPro · on Jan 6, 2022

I'm reminded of what I'm going to call a poem, but which isn't a poem, but more of a long saying. I can't remember the exact wording (if someone remembers and has the link, please link it!) but roughly:

If you tested your program, congratulations. Your program is now tested, incorrectly.

If you tested your program, and verified that the tests are correct, congratulations. Your program is now verified to do the wrong thing, correctly.

It has at least two other lines which I can't even think of the paraphrasing for.

earedpiece · on Jan 6, 2022

But from a scientific standpoint, do you think a software can be 100% correct? Bugs will eventually be found.

The best bet, when writing (100%) correct code, is through thorough testing, and hoping for the best, because when you are handling a million lines of code, errors, will definitely creep up.

Personally, I feel achieving a 100% correct code is impossible, since developers are always updating, and adding new features to the code.

Hence why a team, should be put in place to create patches quickly, when bugs are found.

ebingdom · on Jan 6, 2022

> But from a scientific standpoint, do you think a software can be 100% correct?

Sure it can. I've written 100% correct code using Coq. For example, I wrote a relatively simple program (~1.7k LoC) to interpret a simple programming language, but it was definitely correct (certified by a machine-checked proof).

Of course, for larger programs it's much harder to do that. But it's just a matter of how much time you're willing to invest.

elihu · on Jan 6, 2022

On some level, that's sort of like asking whether we can be 100% sure that the solution to a math problem is correct. I mean, I'm pretty sure that 2+3=5, but what if I made a mistake somewhere? What if I'm falling prey to some cognitive bias that makes me blind to certain math errors? What if all of humanity is wrong about this math problem?

If we can be convinced that we're 100% sure that 2+3=5, then what about a more complicated problem? What's the most complex math problem we can be sure about? What if we have to use a computer to check its correctness?

(In general, I think it's usually good enough to say we're pretty sure something is correct if we have some solid basis for believing that it is. I'm okay with being 99.99% sure that 2+3=5, but in the world of software "I'm 50% sure this software is correct" is a standard that most software I use on a daily basis fails to meet.)

deterministic · on Jan 7, 2022

CompCert is an example of 100% proven correct software. There are other examples out there. Verification tools are getting better every day, making it less costly to prove code correct.

tails4e · on Jan 7, 2022

In hardware (chip) design code verification is done by seperarate engineers. Design engineers have some test code to kick the tyres, but sign off is done independently and rigoirlsy, with functional coverage linked to specification documentation. This is a cautious approach, but hardware bugs can be multi million dollars, so it pays to have this level of testing.

interactivecode · on Jan 6, 2022

On a side note I always find it interesting to see the static type and algorithm fans taut how correct code is. Great we now got typescript everywhere… it’s still amazingly difficult to test and validate user interfaces, styles and layouts.

philk10 · on Jan 6, 2022

Interesting that the 2 books referenced at the end are from 2008 and 2012 (I read them when I was learning more about testing) - nothing new since then?

seumars · on Jan 6, 2022

I'm surprised at the lack of stack overflow jokes in this thread.

wruza · on Jan 6, 2022

Stack overflow?seumars 2 minutes ago | prev | next [–]

I'm surprised at the lack of stack overflow jokes in this thread.

reply

seumars · on Jan 6, 2022

It took me a minute i'm not gonna lie.