Hacker News new | comments | show | ask | jobs | submit login
Why was "goto fail;" added without any other change to that part of the code?
61 points by 0x006A on Feb 28, 2014 | hide | past | web | favorite | 85 comments
Looking at the diff between the two versions of sslKeyExchange.c released by Apple http://opensource.apple.com/source/Security/Security-55471/libsecurity_ssl/lib/sslKeyExchange.c and http://opensource.apple.com/source/Security/Security-55179.13/libsecurity_ssl/lib/sslKeyExchange.c I was trying to come up with a reasonable explanations of how this could have happened, but failed.

Here the relevant part of the diff:

  @@ -627,6 +628,7 @@
           goto fail;
       if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
           goto fail;
  +        goto fail;
       if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
           goto fail;
How could this ever happen? It does not look like a copy & paste error as suggested in other places, it does not look like refactoring. Was it added intentionally to test something and commited by accident? Is there any possible non malicious explanation someone could come up with?

Stop this nonsense. There are plainly dozens of reasons that this could have happened accidentally. Literally every single developer has made similar stupid mistake, and literally every organisation has released stupidly broken code.

Yes, there should have been code review, static analysis, and testing in place to prevent that. That'll probably start happening, and you can bet that there will be serious discussions internally about what happened.

If this was the action of a malicious government agency, then it was horribly hamfisted execution. It might offer plausible deniability in some sense, but it's hard to imagine a situation where such an action would be worthwhile, given the resources at the disposal of any such actor.

>it's hard to imagine a situation where such an action would be worthwhile, given the resources at the disposal of any such actor.

This needs more defense. I find it hard to imagine a situation where such a minimal and deniable action with such a massive effect wouldn't be worthwhile.

No, it doesn't need a defense. There is no evidence to back up the idea that the bug was malicious. People that have worked on large C codebases have seen crazier things than this. Routinely.

I understand that Rails developers believe that all the TLS stacks must have dense rspec test coverage and zero-warnings static analysis passes in their builds, but that isn't the reality for any TLS stack. These aren't Rails apps (and, hate to break it to some of you, but your Rails test coverage isn't doing as much for your security as you'd like to think it is).

This is a nutball conspiracy theory, another example of the tech "community" eating its own rather than focusing on anything that would help mitigate state-sponsored surveillance. And it's happening solely in the service of an "exciting" narrative. The people promoting this bullshit conspiracy theory are, by and large, doing it because they want it to be true.

1. It is certain that NSA has some very smart people playing with the sources of the Apple crypto-modules.

2. If you're a NSA boss, and some of your experts told you that you can break SSL in Apple products with adding one line that could be almost certainly attributed to inconspicuous human error, would you try to make a deal with Apple?

3. If you're an Apple boss, and NSA offers you cache (or other benefits, like competitor intelligence) for adding plausibly deniable bug in your code, would you turn it down?

There is no direct evidence for this case, sure. There is however ample evidence in the Snowden docs that this scenario happens too often for this to be called bullshit conspiracy theory.

Remember how Diaspora was supposed to be the private peer-to-peer encrypted Facebook but peers private keys could be read or overwritten by anyone on the Internet? Doesn't your argument suggest "Well we can't rule out the NSA having a man on the inside trying to undermine their encryption"? Heck, why not go all the way: the NSA funded Diaspora to bring Facebook to the negotiating table?

Down this path lies madness. Software has bugs.

Yes, we can't rule that. But we could clearly see that the Diaspora codebase screamed incompetence. Apple, on the other hand, has some very high quality products and decades of experience.

Yes, it is quite possible that this is a simple bug. The other option is also quite possible :)

It is equally possible that Apple is a giant conspiracy dedicated to concealing the grey aliens who actually control human civilization and are harvesting our brainwaves, which actually function as the raw compute for a giant intergalactic payments call center application, via Flappy Bird. Think about it.

That's not equally possible, due to Occam's razor: http://en.wikipedia.org/wiki/Occam's_razor Plus, the security services have a proven ability and intent. Be serious now.

People seem to forget that NSA is not the only player in town. When you plant weaknesses into things, you need to make sure that you're not actually helping your adversaries. You need to carefully weigh the chance/cost of discovery by an adversary against what you get out of the weakness. The perfect case is when you're guaranteed to own the weakness, ala Dual_EC_DRBG.

Consider that the NSA has already been known to burn stolen certificates for malware code signing. It's therefore not a stretch to assume they can easily MITM TLS without needing the help from bugs.

If they planted this bug, they would have been effectively democratizing TLS MITM to virtually everyone. This would help their adversaries more than it would help them, so I'm not convinced. It's easier for me to buy that the Chrome Pinkie Pie bugs were planted, due to the hardness of their discovery, than this could ever be.

None of this is evidence. This is all innuendo. I could apply the exact same set of arguments to the Rails YAML bug, or to whatever the last Chrome bug Pinkie Pie got working was.

Nobody doubts your ability to spin some coherent-sounding story about the TLS bug. It's not a hard game to play. People have been playing it for centuries. How about you try a more fun topic, like alien landings?

You should make a difference between plausible conspiracy theory and bullshit conspiracy theory. Conspiracies happen quite often, having no evidence just makes them higher quality.

> If you're an Apple boss, and NSA offers you cache (or other benefits, like competitor intelligence) for adding plausibly deniable bug in your code, would you turn it down?

With other companies, I might find this plausible, but I do not believe Apple is very hard-up for cash or direction on which way the market is going. It seems to me that they have more to lose from a high-profile security breach (say, if this vulnerability had been used in a mass theft) than they do to gain from anything the NSA could offer them.

The oddest thing about it is that people who ought to know better are not straight up calling it the tinfoilhattery that it is. Schneier saying things like 'if I were doing this deliberately, that's how I'd do it'. That's how you'd do it and then you'd release the source? Why not patch the binary? Hell, why not patch the compiler, Ken Thompson-style, if you're such a devious insider? It's grade-A Chewbacca defense.

>it's hard to imagine a situation where such an action would be worthwhile, given the resources at the disposal of any such actor.

This needs a defense.

>There is no evidence to back up the idea that the bug was malicious.

This is completely unrelated to what I said needs a defense. It's like you weren't even responding to me.

edit: I don't know Rails.

As if read from a script, the discussion now flees to abstraction and hypothetical. I'm not interested.

I don't know what you're talking about, or what you're responding to that I've said, so your lack of interest is not hurting my feelings.

This needs more defense. I find it hard to imagine a situation where such a minimal and deniable action with such a massive effect wouldn't be worthwhile.

Because the goal of any such agency is not "let's render a minority of computing devices insecure for a short period under certain circumstances." That entire argument hinges on the idea that taking this action would have achieved some kind of desirable goal, when it simply would not have.

In fact, the whole thing is a massive distraction. There are myriad easier ways to conduct individual surveillance if that was the goal, and I would be much more concerned about the fact that these same agents already apparently have legal carte blanche to intercept traffic in other ways.


I think that your defense hinges on this being the only thing that they are doing. As one of many things, with no risk and little expended effort, why not?

>There are myriad easier ways to conduct individual surveillance if that was the goal

What's easier than getting a call from somebody placed at a tech company indicating that one of the most security critical pieces of code didn't have test coverage and wasn't statically analyzed before going into the build has a spot where an errant line could be inserted, in a way that could be argued later was accidental.

The total marginal cost to that hypothetical situation would have been to say 'ok.'

>these same agents already apparently have legal carte blanche to intercept traffic in other ways.

intercepting traffic != decoding traffic

How should a malicious actor have better crafted an exploit of equivalent scope and potency?

Considering the possibility that any system managed from a compromised device could be compromised to any degree, how does the use of static code analysis and testing in the future uncompromise those systems?

Short of finding a unicorn and seeing who can ride it, there's no guarantee that a system isn't pwnd or connected to one that's pwnd. The certificate failure means binaries could have been pushed.

The problem is that you can't trust trust.


Considering the possibility that any system managed from a compromised device could be compromised to any degree, how does the use of static code analysis and testing in the future uncompromise those system

It doesn't. But since you were relying on a closed-source system in the first place, you are already screwed - if that's the level of security you require.

It wasn't a closed-source system. This bug was in public source code, sitting there for anyone to see it. Nobody did.

I'm aware that this particular code was - but presumably whichever hypothetical person has arbitrary binaries pushed to them in the grandparent's scenario was using a Mac or iOS.

Server A is trusted by server B. Danny uses his iPhone to manage server A - ie he accesses server A in a privileged state from his iPhone. If we treat his iPhone as compromised, then the rest of the chain of trust should be treated as comprised all the way through server B.

And that's the more difficult exploit to engineer. If Danny is using OSX for administration and we treat his installation as compromised we get a much richer stack for exploiting server A.

If Apple isn't dogfooding Macs in its development stack, I'd be surprised. Honest mistake theories drawing upon the mechanisms of Xcode fall away. So, yes, in the big picture Apple is fucked so far as security goes. Short of a source code audit- and from whence come auditors of divine perfection- trust consists of no more than wishful thoughts that the breach was an honest error that went unnoticed by everyone who might have wished to exploit it.

Have you ever looked through the code in OpenSSL?

Whenever someone brings up the "trusting trust" thingy, I feel compelled to point out the antidote: http://www.dwheeler.com/trusting-trust/

> If this was the action of a malicious government agency, then it was horribly hamfisted execution

If this was the action of a malicious government agency, you're the proof it was an awesome move, given your "Stop this nonsense".

That's what they expect you to think ;)

It's not true that there was no other change in the code -- there was a change eight lines earlier. It could be a mismerge of the latest upstream code. There was refactoring nearby, and people can make stupid mistakes while refactoring.

I disagree that there's no "reasonable explanation", and so do most other people who've looked at it. That doesn't mean it wasn't malicious, but it means that it has reasonably plausible deniability. (Of course, if you were being malicious, plausible deniability might be important to you.)

++ on mismerge. The change likely won't impact the output binary - compiler should issue a warning, which devs might ignore; then the optimizer would remove it as unreachable.

Perhaps the developer deliberately allowed that mismerge to give himself an aura of incompetence/sloppiness, enhancing his plausible deniability when it is discovered that one of his other changes led to an insidious vulnerability. But more likely it was just an accident - if he were malicious, the effect of bringing attention to his code would be highly undesirable.

> The change likely won't impact the output binary - compiler should issue a warning, which devs might ignore; then the optimizer would remove it as unreachable.

It's not unreachable - it always executes if the if condition is false.

D'oh. That's what I get for not paying attention and looking at the indentation instead of for braces. Still, my bet's on mismerge first - though it'd be worth reviewing that committer's other changes.

The unconditional "goto out" makes the code beneath that condition but before the "out:" label unreachable code.

yes the code above looks like a search and replace of ReadyHash(&SSLHashSHA1, &hashCtx, ctx)) with ReadyHash(&SSLHashSHA1, &hashCtx, ctx)) but while doing that its unlikely you touch the goto line.

I'm not saying there is no reasonable explanation, just that I am not able to come up with one. Many explanations I have seen so far, require additional code changes in that block and make no sense to me looking at the diff.

Your search and replace after copy and paste seems to have failed.

mismerge looks like a possible explanation, do you know what the upstream project/codebase for this is? Would be interesting to look at that.

If you are wondering whether there is a massive conspiracy to make software weaker and to hijack the work of everyone writing networked software and construct a massive surveillance apparatus and use it against enemies of Freedomâ„¢ and randomly chosen accidental targets: the answer is yes. It has been demonstrated beyond doubt.

If you're wondering whether this particular code is a consequence of all that: probably not, but if it were it would prove anything that isn't already known.

A lot of code editors (Xcode doesn't by default, but you can easily configure it to) have a keyboard shortcut for duplicating the current line. I have plenty of times accidentally made changes at the wrong place in a file because I was mistaken about where the cursor was.

The lesson is to always make sure to look at your diffs before you commit your changes.

And to have someone else look at your diffs and sign off on them before it's merged into master.

Google / Chrome does this, and so should everyone else, even more so if they're dealing with security.

That's a good measure too, although I think with a mistake like this, the person writing the code might be more likely to spot the mistake than someone else.

It also makes me wonder if the diff display shouldn't have a specific notation for duplicated lines, since they are a special case that can easily be mistaken for a trivial change to a line with the way diffs are typically shown.

Strongly agree with the lesson. I got burned once, produced a security hole by merging a messy config file. The problem was quickly detected, but I have not checked in a single commit after that without first going through all the diffs. Plus, I sometimes notice possible improvements during such a review. It's very helpful.

Looks like they also introduced a reference leak on allocation failure:

    @@ -198,10 +198,8 @@
            sslDebugLog("SSLEncodeRSAKeyParams: modulus len=%ld, exponent len=%ld\n",
                    modulusLength, exponentLength);
         OSStatus err;
    -    if ((err = SSLAllocBuffer(keyParams, 
    -                       modulusLength + exponentLength + 4)) != 0) {
    -        CFReleaseSafe(exponent);
    -        CFReleaseSafe(modulus);
    +    if ((err = SSLAllocBuffer(keyParams,
    +                       modulusLength + exponentLength + 4, ctx)) != 0) {
             return err;
         uint8_t *charPtr = keyParams->data;

Note the removed CFReleaseSafe(exponent) and modulus. All other return paths in SSLEncodeRSAKeyParams() call CFRelease(exponent) and modulus.

This is why you use goto fail and not early returns.

Maybe another check was done between those two goto fail but later was deleted because didn't make sense. The extra goto fail was left behind.

Yes people make mistakes.... that is why code in a file called sslKeyExchange.c ffs can be expected to be reviewed by another person at apple BEFORE the commit enters a release candidate. A double goto fail; is something that sticks out to anyone looking at that commit line by line. Seems like apple handles application security like any 6 dude valley startup.

My commit suggestion: - goto fail; + fail;

More importantly, why Apple doesn't run test suites, basic checks? Test driven development must be everywhere

Test driven development is actually quite uncommon in many parts of the industry.

Sure the startup culture and new breed (new bubble?) tech companies mostly emphasise it, but among older companies it's much less common for a few reasons:

1. It just wasn't considered normal practice until fairly recently. 2. They often use 'lower level' languages, and as @raverbashing mentions, they're more difficult to test. 3. The companies are much slower moving, so arguably have less of a risk with not having automated testing. Automated testing has more benefits in agile than it does in the waterfall methodology.

TDD might be useful for unit tests (even that I'm not convinced of) but not for integration tests.

The bug happens very deep inside the handshake protocol and it's extremely hard to test it automatically.

How about verifying a bad cert?

That's hardly an excuse given the significance of this code.

Right, this is a much stronger case for static analysis and warning-free builds. This change caused dead code. If that code was supposed to be dead, comment it out.

If the code is supposed to be dead, DELETE it. Do not comment it out.

If the code doesn't belong there, get rid of it. If, for whatever reason, you want to show the reader that code used to be there, delete it and leave an explanatory comment. If you're commenting it out because you're not confident in taking it out or you plan to re-enable it in the future, well, that's what you have source control for.

I don't agree. Or I do to a point, but it's a relatively weak point. I agree code should be yanked if there's no reason to be keeping it around. I don't think there's never reason to be keeping it around. Sending people to figure out which version of the repository had the relevant piece of code (which is now correspondingly less likely to have been updated at all when surrounding code changed) doesn't make sense if you genuinely expect it will be relevant again soon.

Because TDD in low level languages is a pain in the behind.

And sometimes impossible

In Python/Ruby sure, it's easy. In Java, or even C++, it's doable, but not great, and it requires some heavy tricks.

But they surely need tests, but not necessarily TDD.

Not testing everywhere, but "smoking test" that your SSL implementation protects from MITM, at all, and none of your programmers screwed it (with gotofail or anything else).

This is what I expect from an OS developers

Yes, something that would test all the steps of the SSL authentication and possible failures

Maybe the people from OpenSSL have some test like that already

Ack re: TDD, but integration tests with a bad certificate would have picked this up.

And sometimes impossible

I've been in that position. Sometimes with embedded, you can't have software fixtures, only hardware ones. (Bit banging)

Tell me about it

And you think you know what the device answers and understands, but there's usually some gotchas.

Think of change, copy, paste, do something else that you suddenly thought of, forget because lunch, commit without looking. (TODO: Write unit test.)

Or a conspiracy. At this poing, I just don't know any more.

Braces, y'all!

This is why you always, always, ALWAYS put braces around your if()s.

(I'm not religious about what line the open brace goes on, but I will reject any code review that skips the braces entirely.)

Also, you almost never need to use goto.

This isn't goto's fault. A return, or a "finished = true;" or a ton of other code could have had the same result. Using a goto means it is easier to find it than some other similar mistakes.

All three of those patterns are something which would lead me to consider is there a more functional-style option available to rewrite the method in question. I'll take a gigantic nested if over a split-end method (multiple exit points) any day of the week ;) But of course, such things cannot be avoided completely, nor will avoiding them magically eliminate all bugs in the universe..

Hm, and I'm of the opposite opinion. I like methods that fail early. Gigantic if-chains are hella-hard to parse.

I should go consult Code Complete again, actually...

Does no one use a pretty printer before committing code? Pretty printing would eliminate any doubt as to what code belongs to what if-statement and the like.

I use Astyle to enforce both braces and indentation automatically. In VC Pre-Build Event add:

Astyle -A1 -j -c -v asterisk.cpp asterisk.h (Sorry asterisks italicize on HN) http://astyle.sourceforge.net/

This kind of mistake is why I always use curly braces with if() statements when I write C, and double-check the results through Vim's auto-indentation.

A few years ago, my son would watch "Finding Bigfoot.". Someone said they saw a bigfoot running off into the woods the narrator would say, "The bigfoot must have been frightened, so it's exactly what I would expect a bigfoot to do."

Another person would say, they saw a little bigfoot standing on the river bank and the narrator would say, "That's exactly the sort of place we would expect to see a baby bigfoot." [1]

If someone said they saw a bigfoot shopping in Walmart, we could expect to hear, "That's exactly the sort of a place a bigfoot would shop."

The conspiracy theorists and the honest mistake theorists are both finding bigfoot. On the one hand, it's exactly the sort of exploit a state security organ would like to create and on the other its exactly the sort of error a programmer can make in C. And neither side believes in the other's Bigfoot.

[1] We talked about the actual words that were used by bigfoot hunters to advance the story

I had an accidental line duplication in git the otherday trying to do a by-line commit with a visual tool.

I'm 100% convinced it's malicious... But otherwise, it could be explained by the coder adding another check

     goto fail;
Then before committing, sending the diff to a colleague who emails back "you forgot a debug there, besides that ship it"

He removes only one line of two and ships.

And of course i always try to attribute those things to incompetent other than malice, but since in this case there was probably money and bribes involved so i tend to the later.

> but since in this case there was probably money and bribes involved

The cognitive process above is really sad: "I speculate this is what happened as a natural outcome of my other speculation"

my two speculations are really opposite (and most believable one to defend the coders innocence, if that was the case). i presented both and pointed out the one i have a gut feeling towards.

I appreciate the coding standard at a previous employer (http://www.astronautics.com/) a little bit more now, that INSISTED that braces always happen, even for one line conditionals like these. It was kind of irritating at the time though.

i do this all the time in vim. all it takes is one extra keystroke 'p' to do this.

if i were to take a guess, the programmer first typed out all the if conditional expressions, then went in and pasted 'goto fail' out of a buffer after each one, and accidentally did it twice there.

That didn't happen - look at the diff in the OP.

that's bullshit. you don't know that it didn't happen - there's no telling what he did with the code in the time between commits. he could moved things around 20 times, finally ending up in the same place, with the exception of the accidentally pasted a line.

plese use some imagination before jumping to conclusions. a diff tracks changes at points in time, not an editing session.

Do you understand diff format?

do YOU understand how diff works? it tracks changes between two points in time, it doesn't track how you got there or what you did in between.

he could have easily yanked a goto line 'yy' for use elsewhere or by accident and pasted it by accident with 'p' right after he yanked it.

i see two very similar lines, 49 and 53, which were probably copy/pasted and edited. it is completely reasonable to assume he accidentally did it with another line further down the file.

he could have refactored this code multiple times before re-committing this file, moving blocks around or editing lines. there's plenty of opportunity to introduce errors.

do you understand how editing works?

It's just that the code had been around for years by that point, and given that, that's not the story the diff tells.

that's true but there are other, material changes in the file including conditional logic very close to the bug - to me that says there was at least some additional editing going on, giving opportunity to introduce an error.

if the entire diff was just + goto fail; that would be extremely suspicious, but that doesn't seem to be the case.

Common just say, we all know you what you are thinking. Bruce Schneier said it. You should know that we know what you are thinking -- yes there is a very high chance it was put there deliberately.

that's like telling programmers that Apple doesn't use a repo system, so there could not be any trace of this.

And why don't we hear anything the guy who might have done this mistake IF IT WAS AN ACTUAL MISTAKE ?

I don't like conspiracy theories, but I'd be apple, I'd fire the guy for incompetence. I mean of all the code that has to be secure, you have to check this one.

this looks exactly like a copy paste error - there are also plenty of shortcut keys for duplicating lines or transposing them...

Sometimes people make mistakes.

Don't know why you got voted down. You make a very decent point!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact