This article points out other simplications that can be made to help with this error, but frankly the core problem is using insanely verbose boilerplate to handle errors. At bare minimum, the "if ((error = ...) != success) goto fail;" pattern should have been hidden behind a macro such as "attempt(...)"; just refactoring the code to use if statements differently is not sufficient. Further, in a more reasonable language (C++), the "goto fail;" paradigm (that is being used to free manually managed memory) could be replaced by a deconstructor (alternatively, in languages like D, a scope guard), which simplifies the logic. One could also use exceptions (or, if using a language like Haskell, a Maybe monad) to structure the error handling and remove the boilerplate entirely (which also has the incredibly valuable side effect of making the error handling impossible to avoid: forgetting to check a return value, or checking the return value using the wrong error constants, is a serious issue that leads to security bugs; Rage Against the Cage is a simple example from Android). There is really no excuse to be writing code like this in C when it is just asking for these kinds of mistakes by not providing even the most bare of abstractions to keep the programmer from ending up with code that is 50% boilerplate :(. But again, lest people get caught up in "I hate C++" land, even C provided something to help the programmer not type this code a million times (#define macros, which still leaves aome boilerplate in the code, but drops the percentage dramatically, removing any impetus to end up with a pattern here that requires multiple lines of code to express), which despite being an incomplete solution to the core problems of this pattern of error checking, would make this code infinitely easier to edit without introducing errors in the boilerplate.
I'm going to disagree with you on the "if ((error = ...) != success) goto fail;" pattern. I've written a lot of C/C++ during my career, but mostly C# in the last ten years, and that pattern looks totally obvious to me. This isn't because I'm super-smart (I'm not) but because I've internalised the language rules and common patterns and I don't have to think hard about what such a code line is doing. It just seems like a normal c/c++ construct. I understand that it might be more opaque to a ruby/scala/whatever dev, but they probably wouldn't be poking around in libssl unless they knew C/C++.
To me, wrapping such a simple test in a macro just obscures what is going on by introducing unnecessary abstraction. Making the decision about whether to use a macro in this case seems to me rather like the decision to use operator overloading in C++. There are occasions when you might want to do it to hide complexity, but you'd better have a good reason.
The issue here isn't that the code is confusing or somehow not "obvious" in function: the issue is that there are numerous ways we can mistype it, especially as we (apparently) are prone to copy/pasting it because it involves typing an additional 30 keystrokes (in this case, assuming "err" and "0"). The goal isn't to "hide complexity", the goal is to "remove things the developer can do wrong".
Trust me: I glance at that code, and it is 100% clear to me in that moment what it is doing and why it is doing it--I have been programming primarily in C/C++ now for almost 20 years--but I simply do not believe you that an experienced C/C++ programmer glances at this code and knows 100% for certain that it is correct. If nothing else, clearly the person who wrote this code screwed it up ;P.
This pattern just lends itself to silly mistakes: you might check the error using the wrong constant (!= 0 when you need == -1; this could even work temporarily!) or think that a function can't return an error when it actually can (the setuid mistake that burned Android, see Rage Against the Cage); by using structured error handling you aren't just "hiding" complexity: you are removing it.
Now, I happily admit (and already did in my original comment) that the macro doesn't solve all of these problems as well as "use a language that provides better abstractions (even C++!)"; however, it does mean that you can't accidentally miss the set of parentheses (assigning the comparison result), double up the equal sign (comparing instead of assigning), add an extra semicolon (breaking the comparison, so as to always "goto fail;"), copy/paste the boilerplate incorrectly (which you might have even done in a misguided attempt to avoid the previous errors ;P) leaving you with a second copy of "goto fail;" (as the developer here probably did; it might also have been a merge failure: thankfully, that also becomes impossible, as the error check now is naturally part of the same line as the code being checked), or otherwise make any silly "it still compiles, it looks almost identical, but now it doesn't work" errors. That's valuable :/.
"but I simply do not believe you that an experienced C/C++ programmer glances at this code and knows 100% for certain that it is correct."
I agree, but then I didn't actually say that I would know it was correct (to any percentage) by glancing at it. What I tried to say was that the "boilerplate" does not detract from the code exposing its meaning or intention.
Again, I agree that the the wrong error constant could be used. But thats also true of a macro is used - especially if the constant is embedded in the macro which is elsewhere.
I didn't want to get into your point about whether this would be better written in another language, because we are where we are. libssl has bee around for a while and has accumulated plenty of dependencies. I don't think a re-write is going to happen any time soon.
You seem to have ignored the long list of "possible typos similar to this one" that were my arguments for why the macro was better than having the developer copy/paste the code every time. The only argument you seem to be making is: "abstraction is bad"; I mean, this same argument (that abstraction hides functionality and values behind opaque names) is also the reason why developers should not write functions...
It's horrible code that seems to indicate a lack of production coding standards.
Errors, warnings and assertions need to be handled semantically sensibly, obviously and consistently.
Most functions, errors should be handled individually and returned as they happen and the only condition at the very bottom should be success.
My personal fav is anomaly(...) which checks for a nonfatal but suspicious condition. It always throws a warning on STDERR if it fails, even when compiled with -DNDEBUG.
One of my personal "favorites" is this gem from OpenSSH:
/* No MIN_SIZEOF here - we absolutely *must not* truncate the
* username (XXX - so check for trunc!) */
strlcpy(li->username, pw->pw_name, sizeof(li->username));
I don't think that is anything out of the ordinary, besides if pw->pw_name is longer that sizeof(li->username) it will get truncated. I was more thinking of error checking macros and error handling.
Basically the remark was that the comment went from discussing error handling, to the (much) broader scope of 'C and C++ done "right"', and as an example 5 complete, large open source projects were dumped for us to find the examples ourselves.
Just because the code doesn't misuse memory doesn't mean the code is correct. In this case, the return value of strlcpy should be checked to verify that the username is not truncated as it is added to this data structure. The comment even states that the truncation must be avoided, but rather than adding a check there is simply a XXX ;P.
If you want to get good at something, you're going to have to work at it. No one else can give you a shortcut to common sense that comes with mastery. Only pointers. That's just the way life works.
I can figure out how to look up these projects myself. The point is, if you are going to give examples on error handling be more specific. I don't need any general "life lessons" here, the point is it looked like you were trying to make a point, without having one. I.e you did not actually have any examples, you only made it appear so.
I think the second version with the braces really misses the point of the "use braces" argument. From my perspective, the main error cases for that code are merge conflicts and accidental line duplication. In either of those cases the new gotofail line would end up inside the braces where it would be harmless dead code.
The programmer did something to cause doubling of that particular line. Habitual braces placed at the onset of writing the if()... would have made that act, whatever it was, obvious and obviously wrong.
The 2nd version fails as an example because it hides the act of creation then surprises you with "here, look at this, anything wrong?" with the author knowing full well he's making it wrong.
My takeaway from this whole discussion is: assume failure, prove success. The offending function failed by presuming success; it did not positively confirm every criteria for success, with even the "fail:" section executed by success.
I came here to make your third point. I feel strongly that code should be written to "fail safe"-- error state should be assumed until calls return otherwise, etc.
If this had been Python it wouldn't have been an issue ;)
I find it hard to believe that this code had a proper peer review before being committed. A 2nd and 3rd set of eyes experienced at reading C should have spotted this. Peer review is especially important when dealing with core security components like this where simple mistakes can have huge consequences.
Of course, different languages come with different trade-offs, but I wholeheartedly agree that Python picked the good side of this particular trade-off.
If proper indentation is so critical to a human parsing of the code that every style guide in the world demands some type of indentation (note the lack of style guides demanding a complete lack of indentation,) then why not use that very same convention to convey that very same information to the compiler?
The C-family of languages is on the unfortunate side of this trade-off: there's one method to convey statement blocks to the compiler (brackets) and another to convey it to the reader (the indentation convention.)
TL;DR: Multiple channels with redundant information run the risk of disagreeing with each other. I don't know if anyone's come up with a catchy phrase for this, or coined some sort of "law", but it's very real. As soon as you have redundancy of information, you automatically inherit the problem of synchronizing the information between channels.
If this has been Python, a missing/extra indent might've caused a similar issue, or even a duplicate function call. What evidence is there to say that if a C programmer can sometimes miss duplicate lines of code, a Python programmer won't miss duplicate lines of code or a missing/extra indent?
I find it hard to believe that this code had a proper peer review before being committed. A 2nd and 3rd set of eyes experienced at reading C should have spotted this.
They could've spotted this, and maybe due to some other factors (e.g. some people just don't want to point out other's mistakes, like hierarchical cultures) let it pass; or the other reviewers were at the same competency as the one who wrote the code.
Not sure if it is/was, but having at least three people review code before merge should be standard practice for security critical stuff like this. Something I plan to implement in the future.
The original website seems to have gone offline. But based on the title, I think the lesson should be to have more stringent reviews and tests. Things which can't be checked statically at compile time have to be thoroughly reviewed and tested and a coverage tool used to checked for completeness (in the very least).
I have seen teams following thorough review and testing processes in far less critical software!
Agreed, this is a case where a small number of test cases would have been sufficient to cover the critical parts of the code and avoid this particular error and many like it.
Some nice suggestions in the review and most importantly the author leaves the programmer out of it and focuses on the code... except for one bit where the author cites the opinion of the masses (implying ei shares the same views). It's important that when reviewing code you only talk about the code and not the people or person who wrote it. It will happen to you one day and someone else will come along and make all sorts of remarks of your character and ability. And it will be completely unjust and unwarranted. So don't do it.
The conclusion I found wanting. "Should," is a word like, "obvious," that is best avoided. It is a sign of weak, wishful thinking and stinks of condescension. Could, would, or should have... it happened and was dealt with, it seems, in an appropriate manner. Armed with good suggestions on how to avoid it in the future perhaps we won't see the same mistake made again (hah).
Did anyone notice that the "fail:" section is misnamed? It's a completion section, run regardless of failure or success. Marked as such, the reader assumes "if it succeeds it won't go here" before actually (if at all) reading it, processing all the logic, and concluding that it's run regardless.
The question the author seems to ignore is how the second "goto fail" appeared in the first place. In all likelihood it was caused by an erroneous copy + paste where the programmer accidentally included the `goto fail;` from above the `if`, like so:
This is a very poor form of checksum on copy/paste mistakes. As I describe in my top-level comment, a more complete solution to this problem is to avoid the developer ever having to copy/paste multiple lines of silly error-handling boilerplate in the first place: hell, the programmer shouldn't be feeling the need to copy/paste at all, as that's just asking for trouble :(. Instead, a #define macro can be used that would replace the "if ((error = ...) != success) goto fail;" pattern with "attempt(...);" (or, for an even more complete solution that solves other issues that are possible in this kind of code, such as resource management mistakes caused by using goto for unwind semantics or simply forgetting to check an error entirely, C++ deconstructors and exceptions can entirely remove the error handling boilerplate from this function, letting the programmer concentrate on behavior and not error conditions).
I just don't think it's fair to say that braces wouldn't have helped in this situation, because they would have. That doesn't mean that a pattern like this was a good solution in the first place of course.
I've done that type of copy/paste error - inadvertently duplicating a line - more than once when using an unfamiliar and "helpful" IDE. I hate it when an IDE screen flashes and I'm left wondering what the HELL did that just do???
It would be interesting to know how the whitespace (tabs vs spaces) before the extra goto compares to that on the other goto lines. This might provide a clue as to whether copy/paste was involved.
On the subject of how the extra line appeared in the first place, my take on possible causes are, in order of likelihood:
1. Line added during developer testing to force a fail and not removed.
2. Copy/paste error during construction.
3. Artefact from a bad code merge.
4. Deliberately introduced by NSA mole.
Presumably, one someone goes through the version control history, the who and when will be known - though maybe not publicly. I'm not sure we'll ever know why, though.
> It would be interesting to know how the whitespace (tabs vs spaces) before the extra goto compares to that one the other goto lines.
(in case anyone is curious, the "goto fail" lines in that part of the code are identical: all spaces; and the file in general does have an unfortunate mishmash of spaces and tabs)
I tend to agree with the OP, the problem is not the braces per se. IMO, better would be a coding convention to skip curly braces but if and only if the entire instruction is on one line. Then the bug would have stood out more prominently.
Otherwise, the OP is right on the money; there are multiple levels that would have caught this bug - a layout convention, a linter, a code review, pairing, a unit test, an integration test. Or just a refactor of the big old method for clarity.
This. In addition, a linter would have caught his example with braces because of the improper indent. And, IMHO, without that indent it becomes glaringly obvious.
why are we blaming curly braces instead of unit tests? If the code was always going to return a success then that tells me it wasn't unit tested /at all/.
Ah yes, the use of goto conjures up the 800-year-old essay in the room.
The way they use it is widely understood and accepted in C error handling. It's also not the only way you can screw up, either, which is a good reason why you should add a static analyser to your build process.