It's interesting watching all the speculation about "was it a backdoor, or just a bug?"
Lots of points in favor or against:
1) It's a huge compromise, and "open" to anyone to exploit, which would ultimately get caught and fixed faster. But it's also not targeting anything specific, so there's less of a signature of the attacker.
2) Incredibly simple, and thus a plausible mistake.
3) Hidden in plain sight
I'd generally come down on the side of "accident". The better question is if an systematic testing system for exploitable weaknesses (or a lucky chance finding) could find something like this (either a regression, or a badly implemented new feature) and exploit it -- basically assuming there are going to be errors in implementations. That's understood to be a standard technique in the IC, and a whole industry built around it... and then, the scale of compromise.
There are lots of mitigation techniques for vulnerabilities like this (essentially, devices which are too fast-changing and too difficult to fully trust, but which have to touch sensitive data), but it's not as if people can carry around a firewall in their pocket these days, sadly.
I'm certainly re-generating any keys which touched iOS or OSX devices (thankfully very few), and reinstalling OSes, in the interim.
My gut says the probability of a random typo executing without raising an error or exception is rather low. The probability of it doing so in a way that well aligns with the interests of nation states, large corporations, and/or criminal enterprise is even lower.
This might explain DROPOUTJEEP particularly considering how much more efficient it is than breaking messages after they are encrypted.
Sure, this aligns with interests, but the bug's existence is predicated on the entire code base being written with substandard style rules and no static analysis or tests, which suggests to me that incompetence got here first.
We begin therefore where they are determined not to end, with the question whether any form of democratic self-government, anywhere, is consistent with the kind of massive, pervasive, surveillance into which the Unites States government has led not only us but the world.
This should not actually be a complicated inquiry.
Whether it was incompetence or malice, whoever was responsible for the extra goto, as well as whoever was responsible for ANY of that code not following common practices including ALWAYS using brackets, should be fired.
And that fact that this bug and terrible coding style was in the publicly available source code for so long totally disproves ESR's "many eyes make all bugs shallow" myth. Thanks for the false sense of security, Eric. The chickens have come home to roost again.
If "many eyes make all bugs shallow" were true, somebody would have raised the flag "Hey everybody, these Bozos are actually omitting brackets in their SSL code! Somebody could carelessly insert a statement and accidentally (or maliciously) introduce a hard to see bug some day!"
Hardly anybody actually bothers to read code in the real world. So there aren't "many" eyes, and even if there were, many bugs aren't shallow even to expert eyes, and "all" bugs will never be shallow to most eyes.
That's why it's important to pay competent security professionals to actually take the time and effort to audit source code, which is difficult work that requires much time and effort that takes them away from other valuable, high paying, less tedious and mind numbing work.
Weaknesses are what sophisticated adversaries attempt to exploit. Lack of static analysis in the code base might make an exploit take a particular form and make it easier to craft an exploit.
But it's the size of the target base and its quality which makes it worth trying considering the way in which iPhones may be present in even a security savvy an individual's social context.
It is mind blowing that they may not run static analysis on something as big as OS code bases, for something as simple as a duplicated switch break. Easily could have been a copy paste fail but that is why on a project this big you need to have that. Or if they do use static analysis on builds and it failed or by-passed this area, there is another possible hole.
> My gut says the probability of a random typo executing without raising an error or exception is rather low
In many code editors, numerous fat fingered shortcuts could produce such a compilable line duplication/deletion (deletion because maybe that's not a goto line duplicated but a test line deleted).
I'm baffled that neither clang nor gcc spits a warning for the unreachable code.
Not trying to shift blames or anything, but rather than do some finger pointing, I'd rather see this as a warning for all of us to ramp up our game and better our tools so that everyone benefits, reducing the risk of both honest and intentional or covert[1] malfunctions.
Does not NSA have access to multiple of the private keys of the preinstalled root certificates anyways? If so they can MITM any SSL connection where you don't specifically say which CA cert it should use?
Isn't it the case anyways that SSL is not very efficient against state sponsored attacks as almost all of them can generate certificates for any domain?
All this speaks towards it being a unfortunate mistake not something malicious, unless there is something wrong with my understanding how things work :-)
In general, NSA is not my most serious threat (in terms of actual harm; they are the most powerful by far though) -- I agree they overreach, and for some people are a serious threat, but for me my primary concerns are Chinese/other foreign intel (who are documented as going after industrial/economic material much more than NSA) and independent/criminal/etc. types.
The scary thing is the bar is so low; even I could turn this into a nice weaponized thing to go after the long tail of people who don't upgrade, for the next months. The "diode" etc. infrastructure NSA built isn't that different from botnet C&C or the kind of relays people have when hacking.
I wish more people understood this. Not to let NSA off the hook (what they are doing is awful), but the threat posed by the NSA is a higher-level down the road/slippery slope threat.
There are other more immediate and real dangers out there that are actively trying to steal whatever they can find.
What I would like more people to understand is, the NSA planting backdoors into everyone's accounts and software makes everyone less safe to both interior and exterior attacks. So no, it's not a case of NSA spying being a lesser threat to eastern spying, it's a case of the NSA punching holes in your walls letting everyone see what's inside.
I believe the faulty function -- at line 574 -- is a copy/paste of another function located at line 328 in the same file.
I diff'ed both functions to try to understand the changes which may have been made.
The "goto fail;" is somewhat "drowned" into other changes around, so it stands out less using a diff tool because of these surrounding changes. The surrounding changes in question:
1) Moving a "SSLFreeBuffer(&hashCtx)" instruction at a higher position: Very strange since it accomplish absolutely nothing at its new position, as opposed to it's original position.
2) Renaming of two variables: "hash" became "hashOut" and "exchangeParams" became "signedParams".
3) Indentation of "goto"s
Without the above changes, the "goto fail;" stand out rather well when using a diff tool.
It was likely someone subverting the process for reasons of urgency, for good intentions or malice. Simple static analysis or unit tests would have caught this.
That's because "for (;;) ;" is and idiom that's commonly used in a programming pattern called the "for(;;);ce field", to work around other C and C preprocessor quirks, by syntactically insulating and protecting statements in C preprocessor macros from outside interference.
As a matter of fact, by deploying one of those the beginning of the macro, one in the middle, and one at the end, you can set up what's called a "trifor(;;);ce field", an anti-anti-pattern that is documented on the c2 wiki thusly:
The trifor(;;);ce has the power to grant the wish of whomever touches it, and molds the Sacred Realm to reflect that person's heart. The trifor(;;);ce does not discriminate between "good" or "evil". However, if a person without an equal balance of power, wisdom, and courage makes a wish, the trifor(;;);ce will split into its three separate parts: the piece that best personifies the wish-maker will be the only piece to remain in hand, whilst the other two will take residence in whosoever most personifies them. Reassembly is then required for such a person's wish to be granted, but does not exclude another from making the attempt.
No, I'm just joking and making shit up. Don't anyone ever do that! As bitchy and pedantic as gcc and clang are about other things, I'm disappointed they don't complain about that.
I'm baffled that this wasn't caught long, long ago. Most of us have worked with internal systems where certs, for whatever reason, don't match the site, and it's surprising many people haven't noted that such doesn't raise any errors on iOS/OSX.
Though thinking back....I actually remember encountering exactly that on the iPad once, surprised that it didn't raise a flag. Like probably most I just shrugged and continued on.
Because it does not work like that. SecureTransport actually verifies the validity of the TLS certificate. If you hand it an invalid certificate in the first place, it will complain.
The issue is one level deeper. When you connect to a server that supports ephemeral key exchange, the parties involved will generate new key pairs on the fly to use for that connection. In order to make sure that the server has the key published in its certificate, the ephemeral public key shown to the client must be signed by the static private key of the certificate and the client MUST verify it. It is the ephemeral key signature that is not validated, not the signature on the certificate itself.
What happens here is that the server gives the client a _valid_ certificate for which it does not own the private key, and in the ephemeral step, it simply generates a key pair without a valid signature. At no point in the handshake the server is asked to verify its ownership of the private key associated to the certificate it is presenting
To summarize, TLS certificate should be valid, but you are never asked to verify you have its private key when the connection used DHE or ECDHE.
Worth noting that static analysis finds bugs like this immediately. TLS code seems like the perfect candidate to run through static analysis on every checkin. There are products such as Coverity and PVS-Studio that would have immediately flagged this and probably some open-source ones built around LLVM as well (unsure about this one, though). I personally use Coverity and have it hooked up in the same way everyone connects Travis CI.
And if you are going to disable a specific warning, it should only be for a limited section of code and include a detailed comment about why the warning is superfluous and why the code is safe.
If the code hasn't been linted, expect lots of false positives. I usually have to disable entire classes of less serious warnings to notice the errors, then slowly work through to the less serious stuff.
Regarding Adam Langley's comment about -Wall in gcc: -Wextra is what you want to have many more tests analysing your code. I wonder if this would have caught it.
A quick test shows neither -Wall nor -Wextra report anything on either gcc or clang. However, clang's -Weverything does complain (it implies -Wunreachable-code).
Confusingly, gcc accepts -Wunreachable-code as a valid option, but then proceeds not to warn anything. (edit): Which is a known bug apparently,
Excessive fear of backwards compatibility: you make -Wall. People use it. You add a new check which will reject some shoddy code. “What if people complain that their project no longer builds with -Wall? Let's add a new setting…”. Repeat…
I've been wondering how hard it would be to get to the point where the defaults are rigorous and developers have to opt out with specific -Wno-… options.
Apple introduced -Weverything because suddenly changing the scope of gcc's -Wall (which indeed does not include everything) would break a lot of builds.
Yep. Even ones that may not be fully baked, so some people caution against it.
We have it turned on, and about 10 warnings specifically disabled because they were too noisy or not useful for us. It's always interesting upgrading Xcode and seeing what new warnings we get to fix.
Microsoft VC++ has a -Wall warning level too, and it really does mean all warnings. Each major version of VC++ potentially introduces new warnings to -Wall.
My personal observation is that while compilers continually get better at identifying straightforward mistakes, they don't have the same capability as a static analysis tool that works across compilation units. That is the real selling point of these tools. Definitely -Weverything/-Wexta plus static analysis for baseline checking.
My other beef is that compilers always add new warnings as options or behind new "catch all" flags like -Weverything that no one knows about. As long as each new warning can be individually disabled, there isn't a huge cost to pay by making much more of them enabled by default. Upgrading to a new compiler version usually requires a tiny bit of work, so adding a few -Wno-* rules for new things you want to disable until the code is clean (or forever) is a small price to pay for all new code getting the checks by default.
Unfortunately OSX does not appear patched even in the latest developer center 10.9.2 build (13C62). Tested in both Safari and OS-distributed curl. Chrome/Firefox of course is still fine since it uses the NSS stuff, but plenty of OS services use the OS crypto. (I'm violating NDA by commenting on pre-release Apple stuff, of course.)
Windows or ubuntu bootcamp until they fix this, I think.
MITM of any SSL connection in Safari and other system apps, and they couldn't even bother to have an OS X patch ready at the same time as the disclosure for iOS? I think anyone relying on the security of OS X is going to have to seriously rethink their OS choice after this.
This. Canned my MacBook Pro over this. It's been a bumpy relationship anyway. Just on Thursday Apple Mail stopped sending mail to my SMTP server. Final straw for me.
Tell that to the people who generated keys on Debian.
Programmers are simply not good enough at writing secure code. Full stop. If you say anything else, you're just flaunting your own unreliability as a source of security advice.
At least that was public and fixed. I think the point is that keys could be completely deterministic on closed source systems and there is no way to know. It's very easy to have deterministic "random" sources that pass statistical tests for randomness.
How is what I said not true? If the same thing as the Deb prng bug happened in a closed source system, it could sit for a couple of years exposing thousands/millions of systems and then be patched quietly to avoid embarrassment, leaving everyone vulnerable.
Static analysis would have caught this immediately. It caused unreachable code that the compiler even recognized and optimized away.
You're saying Google Chrome doesn't test that an ephemeral key is actually signed by the cert's private key? If that's the case, that's completely unacceptable because it's the whole point of the protocol.
I think the various BSD folks, especially those from the OpenBSD team, would like to have a word with you. There are many other unixes out there that put a strong, probably even stronger than linux, emphasis on security.
Um, sure, after it updates. Assuming a similar bug isn't found there and exploited when it connects to the Internet. Since, you know, it can't do anything without a network connection. Just playing devils advocate here, I like ChromeOS too.
Since the source code is available, might it be possible to produce a hot patch to the binary so that those of us running Mavericks won't have to go through the next few days or weeks with our pants down? It would be a simple matter of finding the JMP instruction generated by the second GOTO and replacing it with no-ops. How hard could it be,at least for someone who actually knows their way around OS X binary file formats?
Likely the compiler will have optimized away the rest of the function as dead-code, so it's not as simple as just getting rid of the unconditional jump.
It appears to be in a cursory glance. Loaded for me in Xcode. That said, signing the security framework would be another issue ;-) I can't imagine apple would let malware replace security binaries that easily.
(Caveat: I haven't tried this myself!)
Here's the analysis and an experimental patch from @i0n1c for the 64bit version of Mavericks' Security.framework (32-bit remains unpatched):
http://sektioneins.de/blog/14-02-22-Apple-SSL-BUG.html
Not very well tested, please let me know if it works for you. If you're on OS X Mavericks or on iOS 7 and haven't patched you should get big scary red text.
There's not a whole lot I can do about that without adding a lot of complexity. You could try downloading https://gotofail.com:1266/test.png I suppose.
I noticed a while ago that while Safari supports TLS 1.2, but Mail does not. So somewhere in the implementation they are using different code. (Not agreeing or disagreeing, just mentioning an observation.)
Yes, I did it this morning. Look for the update from your phone, so it's a smaller download anyway. It takes more room than they ask for, maybe 750 meg. I just temporarily deleted some podcasts to make room.
From looking at the code it seems like there should be a way to change the instructions to "fix" the issue, but in a round-about way. This code:
if (sslVersionIsLikeTls12(ctx)) {
/* Parse the algorithm field added in TLS1.2 */
if((charPtr + 2) > endCp) {
sslErrorLog("signedServerKeyExchange: msg len error 499\n");
return errSSLProtocol;
}
sigAlg.hash = *charPtr++;
sigAlg.signature = *charPtr++;
}
is only executed in the TLS 1.2 case, but the sigAlg structure is always on the stack. So if this code remains "skipped" in the non-TLS 1.2 case, then later on:
the broken "else" case can be replaced with instructions to poke the proper values into sigAlg and then relative jmp to the code offset where the inlined SSLVerifySignedServerKeyExchangeTls12 begins (0x86cb9), as that version of the function does not have the bug.
I checked inside the SSLVerifySignedServerKeyExchange disassembly and the compiler expectedly omitted the remainder of the function, so it isn't as simple as sticking a few nops in.
That's a pretty interesting approach. Might even work!
I wonder if something simple (and stupid) like an LD_PRELOAD with an alternative fixed SSLVerifySignedServerKeyExchange would work. Won't work if the code ends up getting inlined.
Guess not, looks like SSLVerifySignedServerKeyExchange is static.
I'll admit I don't think of myself as great at unit testing, but the first thing I do when writing one for a new tool or class is use a code coverage tool to look for uncovered lines and once I have written a few basic behavioral tests I write tests to exercise and validate the output of the uncovered lines.
This ensures that the tests I write to cover the public API don't miss catching internal behaviors that require more subtlety to uncover.
My theory is that Apple is spread too thin. Kayak.com reproducibly crashed MobileSafari the day iOS7 shipped, and brand new iPad minis regularly kernel panic and reboot.
An integration test wouldn't catch this one. You need a specific malicious SSL server (presents a valid certificate, uses ephemeral mode, does not present a valid signature proving that it owns the private key). The code does validate certificates, otherwise this would've been caught ages ago by anyone trying an invalid cert.
Well, as the article says: A test case could have caught this, but it's difficult because it's so deep into the handshake. One needs to write a completely separate TLS stack, with lots of options for sending invalid handshakes.
Something one would have reasonably expected after second biggest company in the world (NASDAQ, Fortune 50), and first in the IT sector.
The entire OS is damn too huge piece of software for Apple not to test intesively to a fanatic stretch. Since all the NSA revelation and the proven extent to which US government lied and continues to lie to its people about all data gathering, I fail to categorize such a simple but yet brilliant overlook as a mistake.
I cringe whenever I hear people advocating that it's okay to avoid braces with single-statement conditionals/loops/etc.
This is a perfect example of just how bad that suggestion is, and just how disastrous it can be. It's totally worth typing a few extra characters here and there to basically avoid these kind of situations completely.
I hate omitting braces, but Javascript semicolon omission is a totally different argument. It's easy to show simple cases where omitting braces causes problems. It's actually quite difficult and artificial to find cases where (especially with "use strict") semicolons confer any benefit at all, and in some cases they make things worse (e.g. multi-line variable declarations). Also, "correct" use of semicolons in Javascript makes code less consistent (e.g. some function declarations end with a semicolon, others do not) whereas consistent use of braces is consistent.
I certainly see far more bugs caused by an improperly inserted semicolon than an improperly omitted semicolon, but then I'm looking at jshinted code most of the time.
I used to believe as you do that semicolon issues were contrived and unlikely in JavaScript. But over the years I've been bitten by it enough times to know better.
IIFEs are the most common source of semicolon problems:
for(i = 0; i < 10; i++){
x = arr[i]
(function(x) {
setTimeout(function(){console.log(x)})
})()
}
A less common sort of pattern that I still use pretty often as a DRY measure:
This is clean and readable, and without semicolons it's completely wrong.
Now sure, you can prepend semicolons to these specific cases, and that will work. Here's why I dislike that:
1. Those leading semicolons look weird. To the uninitiated, it looks like a typo, and that's a significant hazard if anyone else is going to touch your code.
2. While there are only a handful of special rules like this to remember, there's only one rule to remember if you don't rely on ASI.
Good examples actually. (Makes me feel better about my ingrained semicolon habit.)
Your examples will crash immediately and at the right spot though. The problems I see caused by excessive use of semicolons are often far weirder.
That said, inadvertent errors caused by semicolon insertion are still more common and baffling (especially by people addicted to jslint who use a variable declaration idiom particularly easy to screw up with errant semicolons).
The first example won't crash if the rvalue preceding the IIFE is a higher-order function (specifically, one that outputs a function).
A relatively rare scenario, but a brutal one to debug.
As for errors caused by extra semicolons, they can be weird, but I don't think I've ever actually hit one in practice. They'd also be a little easier to spot, since you tend to develop a reasonable instinct for where semicolons belong.
I'd say it's a lot easier to be suspicious of lines of code that start with '(' or '[' than find semicolons at the end of the wrong line. It's incredibly easy to put a semicolon where a comma is expected and vice versa. I do it all the time (usually causing an immediate error so it's not a huge deal).
...just overwrote c in a different scope. This kind of bug is common, idiomatic, baffling, and actually more likely among coders subscribing to javascript "best practices".
jslint won't stop it if there's a variable (c in this case) in the outer scope -- it's perfectly fine code as far as jslint is concerned. And jslint encourages that declaration style (actually encourages is too weak a word).
> I do write single statement conditionals without braces, but I write them on one line
That's fine until you write
if (something) do_something(); do_something();
or worse:
if (something); do_something();
I've actually seen something very similar to that one.
You haven't really changed the dimensions of the problem by putting it all one one line. Only the whitespace is different. Sure it looks wrong to you; but so does the incorrect code from apple.
I don't understand the need to omit braces; it doesn't make the compiled program any smaller. And denser source code is not always more readable, or we'd prefer "? :" to "if" every time.
This is true; there is no silver bullet. So we rely on defence in depth.
The first line of defence is consistent code layout; e.g. always use "{ }" after an if, and only one statement per line. This aims to make simple typos less likely to compile, make the code more readable and so make errors stand out.
Then there is using and paying attention to the linter / compiler warnings / jshint + 'use strict' / static analysis or whatever it’s called in your language of choice.
Then there are unit tests or integration tests.
Any of these might have caught the apple error, but there are no absolute guarantees.
IMHO it's a professionalism issue – gifted amateurs are great, and they can succeed... most of the time. But a professional engineer would put aside ego "I won’t make mistake like that" and laziness "I don’t need to type braces this time" and do the careful thing every time. Because each step raises the bar, and makes it harder to ship code with bugs like this. Because sometimes it matters.
Funny thing is, I was finally looking at Rust the other day and thought, hey, that's nice that if statements are forced to have braces, I never liked one-liner if statements in C. And here we have a perfect example of what can happen with the brace-less if statements.
Requiring braces is the wrong solution to the problem. It's the indentation that misleads here; indentation is what people look at to see what's part of which if, so have the language look at the same thing, like Python does.
That's the thing - with an indentation-based syntax, the code would have been okay! The second goto would have been part of the if statement, so it would have had no effect beyond the aesthetic.
Interesting that, if there was a source code formatting tool (like gofmt for go), it would've made the bug easier to spot because the 2nd goto would lose it's misleading indentation.
Outside of python, for obvious reasons, it should never be allowed. The slightly nuts conditionals in Erlang are justifiable for avoiding exactly this kind of problem.
Agreed. Mandatory curlies might have helped avoid disaster here, but the code in question is still utter rubbish. If blatant nonsense makes it past review (or if there is no review), no coding style rule in the world is going to help.
I came here to say this. Yes, testing is important. Don't ignore it. But let's not lose sight of the high value of smart, time-tested coding conventions.
They exist because of the long, bloody history of mistakes by very smart people.
If you have a programmer on your team who is likely to modify
if (condition)
{
doSomething();
}
into
if (condition)
{
doSomething();
}
{
doAnotherThing();
}
instead of
if (condition)
{
doSomething();
doAnotherThing();
}
then that person needs some serious mentoring right away. 'Cuz. . . just wow.
As far as the original example goes, if it's an error it's most likely a copy/paste error. Curly braces help there, too. With three times as many lines in the block, the odds of a paste error resulting in code that even compiles is greatly reduced, and a compiler error should call attention to the issue.
Even assuming the bug was malicious, the curly braces would increase the odds of it being caught by another developer. This is particularly the case now that offside rule languages are common. A large chunk of younger devs cut their teeth on languages where
if (condition)
doSomething();
doAnotherThing();
doesn't look odd in the slightest. But I think that the 2nd example above would still look immediately bizarre to nearly everyone.
As far as the original example goes, if it's an error it's most likely a copy/paste error.
Right, and this demonstrates the major problem with verbosity in languages and APIs and design patterns. When you have to repeat yourself many times, it's very easy to make a mistake in one of the near-copies, and you or a code reviewer can miss it because you'll tend to quickly skim over the boilerplate.
For cases like this, using exceptions rather than manual error value checking would make your code shorter, less redundant, and not susceptible to this particular bug. (Not possible in C, I know).
I understand that there are sequence points at those || divisions, and I appreciate that you've been careful with your layout. Even so, my spider sense is tingling horribly at having not just one assign-and-test in the condition for an if statement but a whole series of them that each reassign to the same variable.
If there were a language feature available to do this more elegantly, whether exceptions or something else such as a Haskell-style monad with fail semantics, I'd almost certainly agree with using it in preference to a series of manual checks, though.
>> Is there a case where the compiler won't short-circuit the || statement on the first failure?
Left-to-right, short-circuit evaluation of the logical operators is so deeply rooted in C idiom that a mainstream compiler that doesn't respect it would be practically unusable. But perhaps it wouldn't work in a hobbyist compiler, or a non-standard dialect of C.
Indeed, the code as presented there had well-defined behaviour according to the standard. Note to lunixbochs: This is why I mentioned sequence points before.
While valid according to the language definition, assignments within conditional expressions do have a reputation for being error-prone and not always the easiest code to read. Sometimes they're still neater than the available alternatives.
However, IMHO in this case, it feels a bit too easy to break the code during later maintenance edits. For example, someone might simplify the code by removing the explicit comparisons with 0 that are unnecessary here, and then later someone else might "fix" an "obvious bug" by replacing the "(err = x) || (err = y)" construction with the common construction "(err == x) || (err == y)" that they assumed was intended, perhaps prompted by a compiler warning. Obviously they shouldn't if they didn't properly understand the code, but when has that ever been a reliable guarantee? :-)
I agree, I was thinking about what this situation would look like in other langs and when I turned to Go, I realized:
While Go has goto for tricky situations like this, because it has defer you don't have to use it often, assuming the free calls were needed (and the vars were not going to be GC'd):
I would probably flag the code above in a security review because it hides the key part at the end of a complex line. Unless you're coding on VT100 terminal it's worth the extra line to make the test logic incredibly obvious:
True, but I've definitely noticed that particular style of writing if tests using a one-line assignment and obscured test condition seems to be pretty common in the Go community and it's a bad habit for understanding code.
Is there objective evidence for this? As a Go programmer a semicolon in an if statement screams to me. I can see it possibly being in issue for new Go programmers- but I don't remember it being one for me.
I didn't do a survey but I remember that and frequently punting on error handling (`res, _ = something_which_could_error()`) showing up enough in the projects I saw on Github to stand out as a trend when I was writing a few first programs. I certainly hope that's just sampling error.
The `else if` is useless here, since you return anyway. It only adds noise.
I rarely use the one-line `if err := ...; err !=nil ` idiom because its quite a mouthful. However when I do, I try to make sure it's not too much to grasp at once. Here the extra `else` goes against that.
Alright, I know this is just a quick snippet on HN and all, I just thought I'd mention it anyways. Maybe next time you actually write that in code you'll think about my point. ;)
Couldn't you just set a 'failed' boolean and wrap the fail: statements in a conditional check on it? Not that I'm saying all uses of goto, particularly this one, are necessarily absolute evil.
This is what goto statements are good for. The Linux kernel also uses goto statements a lot for similar reasons.
This is not spaghetti code or what Dijkstra talked against.
It's one of the two things one hears novices: that gotos are to always be avoided (because they heard that they are bad),
and that we should write stuff in assembly (cause they heard that it's fast).
I don't think you necessarily need unit tests to catch this; code review would work (even I probably would have caught this if I'd actually read it; "fucking gotos" would have drawn my attention to begin with, and if it was a single line commit, even more so.)
The problem is Apple (intentionally) understaffs and overworks, so I doubt they have the spare people to look at most commits.
Can someone explain to me what this code is? It appears to be implementing standard crypto procedures -- are there not suitably licensed public implementations that apple can use? If they are writing it themselves, why are they open sourcing it? (I don't have a reason why they shouldn't it's just that I've never associated apple with much interest in open sourcing things).
Using public implementations for crypto code has been difficult until pretty recently. There didn't exist many high-quality, trustworthy, public implementations of the types of crypto that real-world systems have wanted to do.
Even now, the situation is pretty grim for anyone who wants to integrate crypto code into an existing large codebase. The canonical answer is "just use NaCl." However, I've had developers outright refuse to use NaCl because "it's so hard to compile." They were saying NaCl has some issues compiling on certain types of architectures, and issues compiling as a dynamic library. When libsodium was suggested in response, they retorted that libsodium was "NaCl but with modifications by people other than the ones you'd want making modifications to crypto code." In other words, libsodium purports to be "NaCl, but easy to integrate into your project," yet the reason it's easy to integrate is because someone other than recognized cryptography experts fiddled around with NaCl until it was easy to compile. So apparently it's an open question whether libsodium is as trustworthy as NaCl, and NaCl is a pain to integrate.
All of this implies that the current situation is still not very good as of 2014 for people who want to just do crypto safely and in some standard way. And rather than writing that code in 2014, Apple had to write it years ago, when the situation was far more painful than the present day's shortcomings.
To answer your question, yes, there are public implementations which could theoretically be used. But actually using them is... difficult. Not nearly as difficult or as error-prone as rolling your own, but perhaps difficult enough where someone who isn't an expert in cryptography might make the dire mistake of believing that rolling their own is less difficult.
I wish Matasano would publish an open-source crypto library. An "NaCl that's easy to use and that people actually trust." They have the resources and the credit to pull it off.
Alternatively, I wish we could gather funds to have them audit libsodium.
Also, all of this is based on the assumption that NaCl or libsodium actually meet the needs of what Apple was trying to do with this crypto code. It probably doesn't, because NaCl / libsodium provides a complete replacement to the entire crypto stack of an application. To take a wild guess, I'm thinking that Apple needs to implement / interact with some other kinds of crypto than what NaCl/libsodium provide. For example, certificate validation. So then you'd suggest "use cryptlib!" but this is hampered by the fact that in addition to all the previous considerations, cryptlib also isn't free.
Thanks! Is it surprising that they open source it? That gives their adversaries opportunity to look for weaknesses in their code. And as for benefit for apple, ... it seems to be exposing the fact that they have a mixture of tabs and spaces and have let someone edit an absolutely critical file with no code review or testing.
What you say is true. But security through obscurity doesn't work, because people are very good at reverse engineering and decompiling. So your choices are either: open source your crypto code and suffer some embarrassments which people will forget and forgive, or have closed source crypto code that either doesn't work or can be exploited in subtle ways. Pick your poison. :)
Emerging tools such as gofmt and clang-format are able to automatically re-format your code based on the language rules rather than the human behind the monitor.
Using those tools this specific category of issues should at least be visible in the formating diff.
Interesting. Never thought about it that way before.
Those kinds of tools aren't new or emerging. The "indent" utility for C has been around for decades. I first remember using it on 4.3BSD, but it may have been around before that.
NAME
cb - C program beautifier
SYNOPSIS
cb
DESCRIPTION
Cb places a copy of the C program from the standard input on the stan-
dard output with spacing and indentation that displays the structure of
the program.
Worse, lint was part of the original C toolchain, but very few people cared to use it, because they couldn't be bored to tune the tool to their projects.
This is the type of error that any static analysis tool would easily pick.
Trying to get teams to agree on the code style is the hard part when using such tools. `gofmt` somewhat avoids that by defining one true way and only giving minor customization options.
clang-format has the same property. You just have to tell off a bunch of neckbeards and/or fire them if they can't agree to its formatting rules.
Having cast-in-stone formatting rules makes code review easier and shorter and avoids lots of arguments. For ultra-dangerous languages like C++, automatic formatting is indispensable.
I'm going with accident either through a merge or a double paste. I'm guessing that if they were running some kind of static analysis tool that it probably would have flagged this one as a duplicated line and someone would have caught it earlier.
The static function that has the bug will most likely be inlined into the other static function which will then be inlined into the outer public symbol that calls it.
I believe you can dump the assembly of the entire function like this:
otool -t -p _SSLProcessServerKeyExchange \
-V /System/Library/Frameworks/Security.framework/Versions/A/Security | less
edit: I believe this could be the offending instruction:
0000000000086df6 jmp 0x86e0d
edit2: That is actually in SSLVerifySignedServerKeyExchangeTls12(). Trying to track down the real one..
edit3: Looks like the compiler did optimize it out if I'm reading it correctly now. The code is around 0x86c97. It does the last if() call and then immediately calls SSLFreeBuffer() and jumps to the end. :(
I would be really surprised if the rest of the function was still there - this seems like the perfect example for a dead code elimination compiler pass.
The faulty block of code -- line 623 (function SSLVerifySignedServerKeyExchange) -- looks like a cut and paste (or the reverse) of the block of code at line 372 (function SSLSignServerKeyExchange), except this one doesn't have the extra "goto fail;".
Whoever wrote that code even before it had the bug was an incompetent cowboy hotshot who wanted to show off how pedantically he knew and could surf the nuances of syntax and optimize the code to have the smallest source file size by penny pinching brackets.
"ALWAYS USE BRACKETS" is one of the oldest and wisest rules in the book, and this shows why. Anyone who actually writes security related code like that, blatantly ignoring important coding practices in the name of "aesthetic beauty" or however they rationalize omitting brackets, should be fired.
I bet they're the same cowboys who fanatically prosthelytize JavaScript automatic semicolon insertion too, because that's another outlet for their obsession with showing off how clever they are to write code that requires them to deeply understand and consider every nuance of the syntax at every keystroke, instead of facing the fact that they or somebody who comes later might actually make a mistake, and coding defensively.
Maybe they really are fucking geniuses who never make mistakes, but that's not true about everyone else who has to read and modify the code.
Apple should go through all their code and fix it, and dock the salary of anyone caught not using brackets.
Or an intern to rewrite the entire thing from scratch. The code quality of the entire library is awful. No comments. Impossible to determine ownership of pointers. MIXED TABS AND SPACES?? WTF?
People are downvoting my comments all over this thread but really, if this is not literally the worst code in iOS then I'm throwing away my iPad immediately. The priesthood of cryptography is always warning the lay programmer to avoid re-implementing crypto, and instead to use libraries written by experts. But the crypto experts are apparently good at math and terrible at programming computers. This Apple library certainly isn't the only evidence of that. OpenSSL is largely uncommented and untested as well.
This shows another of the benefits for the community of open sourcing code - because we can see exactly where and what the bug was steps can be taken in other projects to stop it happening there (I think Adam mentioned he was going to check for a test case in Chrome).
If the code was closed all we would have is Apple's release note which just says validation steps were skipped...
I loaded up the Xcode project for Security.framework and it complained about disabled compiler warnings.
I also noticed their open-source download page [1] doesn't force you to https by default, which probably matters to anyone looking at Security.framework to fix SSL MITM problems :)
Doesn't look like they make their frameworks easy to compile for outsiders. The only corecrypto headers I can find are in XNU, and they appear to be outdated.
I'm hand-constructing what I can from context. Here are the defines from ccasn1.h: https://bochs.info/p/6jquf
You need these too:
typedef int ccoid_t;
extern size_t ccder_sizeof_raw_octet_string(signed long length);
Currently working on libaks.h and corecrypto/{cczp,ccec,ccec_priv}.h
I wonder if this would have as much attention without the Snowden leaks. I guess yes.
Not to be a conspiracy theorist, but history tells that computer technology was released to the public, so to me it still looks like there is no absolute objective, safe way to use a computer for a layman. It still looks more like a tool of surveillance than a tool of communication.
You'd want my opinion, but I'd be apple (or Linux for the days that happened for a certain commit with a =!0 thing), I'd try to sue the guy who was responsible for this commit.
You could argue in court that it's impossible to prove if it's voluntary, but I guess we must set an example and start to hold anyone responsible for this. It's time to set the courts aware of computer security standards...
You cannot sue a programmer for writing a bug in software, or at least it would be thrown out immediately. It is the company's responsibility to ensure that the code is peer reviewed and tested before deploy. There are so many bugs in software, that this would be a terrible standard to set, and no developer would be safe. The best a company could do is fire them, but even that is unlikely.
> You cannot sue a programmer for writing a bug in software ...
Of course you can.
> It is the company's responsibility ...
Not if the programmer was hired for his independent expertise rather than to be told what to do, and how to do it, by higher-ups. That's a a common arrangement in small companies.
> There are so many bugs in software, that this would be a terrible standard to set, and no developer would be safe.
All true, but surely you realize that lawyers and judges don't necessarily ask themselves, "Is this fair?"
Remember that a group of scientists were successfully convicted of manslaughter in Italy for not predicting an earthquake. It was a different country and legal system, but not that different.
> Not if the programmer was hired for his independent expertise rather than be told what to do, and how to do it, by higher-ups. That's a a common arrangement in small companies.
1. That does not excuse software from a proper code review process, in fact any reasonable software company would probably see this as even more of a reason to code review the code. If I knew that a software company doesn't properly review critical code, I would definitely think twice about using that company's software.
2. Apple is not a small company so that doesn't excuse them in this case.
>All true, but surely you realize that lawyers and judges don't necessarily ask themselves, "Is this fair?"
Remember that a group of scientists were successfully convicted of manslaughter in Italy for not predicting an earthquake. It was a different country and legal system, but not that different.
http://abcnews.go.com/International/scientists-convicted-man....
Come on this is the Italian legal system. No offense to Italy, I believe it to be a beautiful country, but they have had a shocking record in their legal system as of late. And I highly doubt that this would have stood up in many other western legal systems.
But even if I concede to all your points, someone on HN should be smart enough to realise that suing someone for a software bug would definitely not be a smart move. And we definitely shouldn't be recommending such a thing. Just like we shouldn't be recommending scientists to be convicted of manslaughter for not predicting an earthquake.
>> Not if the programmer was hired for his independent expertise rather than be told what to do, and how to do it, by higher-ups. That's a a common arrangement in small companies.
> 1. That does not excuse software from a proper code review process, in fact any reasonable software company ...
You changed the subject. The original subject was whether a programmer could be held accountable for a bug in a computer program. The answer is yes, under civil law, that can happen.
There are companies whose computer science departments consist of one person, which was my example, and in such a case, there is no company-wide software review process, because -- as I already said -- the computer science department consists of one person working alone.
Umm .. Code-coverage, Apple? That's the most scary aspect of this issue - that there is clear evidence that Apple aren't testing with coverage. At all. Or else this wouldn't have ever made it to release ..
I installed this last night and the update came in at 15MB. Surely a one line bug shouldn't cause a 15MB update? Or is it that some things in iOS might be statically linked and those had to be pushed out as well?
The dynamic libraries in iOS aren't shipped as separate dylibs on the device. Rather, they're essentially concatenated together as part of a single file called the dyld shared cache that also lets Apple do some prebinding tricks, interning of objc selector names across all system libraries, etc.
The system-wide iOS build scripts actually randomize the order of the libraries in the shared cache by default (you can check this using dyld_shared_cache_util against the shared cache file, which you can compile from the dyld project on Apple's open source site). Since the ordering of dylibs in this giant shared cache file varies between builds, you could easily end up with a 15 MB diff even though all you've done is deleted a single goto in source code.
The update probably includes the actual updater executable, not just the diff. This would allow you ship different kind of patches (from simple fixes to entire OS upgrades) using the same mechanism.
Can't wait for Apple to patch this bug? Downgrade to 10.8.5 ...
The answer here, folks, is don't update until all the bugs have been found and fixed. Apple's golden days of OS updates was back in 10.6, when they spent most of the time just fixing bugs rather than adding new features. Every OS update since has had as many bugs as it fixed, though overall things were getting better.
It's the old Wait-until-SP1 advice. For security, it's often true. The exception being if the patch cycle is fast enough...I'm not going to say for sure whether one is better than another.
> The answer here, folks, is don't update until all the bugs have been found and fixed.
How can you ever be sure of this? And also I can almost guarantee that there is no non-trivial software that is bug free. As a programmer the only way you can guarantee bug-free code is to not program at all.
So if you follow this advice, you will never update at all, which in my opinion is bad because the benefits of an update outway the negatives.
I was being facetious in that one line. I never expect "all" the bugs to be fixed, but it's equally true that if you wait 8 months, you'll know better what you're upgrading /to/. In this case, 10.9 had the bug, 10.8 did not. It's usually the case that as new features are added, it's much more likely that you've added new bugs. Whereas in slightly older code, you might be "out of date" but at least you know where the bugs are, generally, by watching what changes in newer releases and back-port as necessary. And as I think I've said elsewhere, if code is updated frequently, this advice might change -- so long as you're running a stable channel, you can both get new updates and enjoy tested code.
This seems to be the current thread on the topic, so I'll mention here in case anyone cares that iOS 5.1.1 does not contain the vulnerability, according to gotofail.com's test for it.
I'm calling you on the GOTO is evil thing. It's not. This is a well-structured use of GOTO, and it has a long tradition in C shops. This is a well-understood idiom, and you're seeing a different failure.
Your choices for exception handling systems in the C-like languages are:
- early return (whereupon, resource leaks and bloated cleanup code)
- setjmp/longjmp (whereupon, utter chaos)
- "chaining" success, where you don't use GOTO and have to consistently check your current 'err' state before doing additional "work" (whereupon, lots of bugs when you forget to handle an error case)
- or GOTO to a single piece of error handling code (this is usually coupled with an earlier hunk of code where you initialize variables and set up state; this is essentially the C++ "constructor / destructor" idiom, but done by hand).
The GOTO is the best choice, because everything else is worse.
(If you're using C++ exceptions you're in serious trouble. The effort required to write correct code in the presence of arbitrary exceptions is very high, and you're likely to get things wrong. Scott Meyers wrote like three whole books on the subject, which should be strong evidence that something is Very Wrong, and if you've ever used exceptions in a language that doesn't have garbage collection you'll probably agree (and even GC doesn't save you). Most production C++ code that I've seen use exceptions simply catches them in a bug-handling layer, which responds by scribbling some kind of report and then restarting the app).
Often these GOTOs are wrapped with macros that hide the fact that GOTO is being used. This can hurt code quality, since things are less clear, but in general they work well.
Basically they should have had the dead code warnings enabled, and listened to them, and done better checkin reviews. GOTO is not the enemy here. In paranoid code like this, generally you want to "claim success" as late as possible, so setting 'err' to some failure code would have been a better choice.
If you're using C++ exceptions you're in serious trouble.
Sorry, but I'm going to call you in turn on that one.
Of course there are some unwise things you can do in the presence of the C++ exception mechanism, and 15 or 20 years ago plenty of us did them. I think probably Herb Sutter deserves more credit than anyone else for drawing attention to these things, but the point is valid in any case.
However, we've figured out a lot since then, and for a long time writing exception-safe code from scratch in C++ has been quite straightforward. Just follow the common idioms, which mostly boil down to "use RAII for anything that needs cleaning up".
The biggest difficulties with exceptions in C++, IME, generally come from trying to retrofit them into code that was written without exceptions in mind. If your existing code doesn't necessarily follow even basic good practices for writing exception-safe code, the WarGames rule applies.
- Dijkstra said a lot of stupid things, this is one of them. Parsimonious use of GOTO is fine and sometimes leads to clearer code (exiting deeply nested conditions for example).
- Another stupid thing that Dijkstra said is that anyone who started by learning BASIC has their brain corrupted forever, which is total nonsense (I bet most people reading this started with BASIC and turned out just fine).
There is a context missing. The Dijkstra quote is really old (Wikipedia dates it to 1975) and do not talk about "modern Basic" we know, with a good support for structured programming -- it's about a language where GOTO's were used instead of proper loops. It's certainly still an exageration to speak about "brain corruption", but I think I know why he put such emphasis on the statement.
First, the problem here is totally different than the goto, it's the uncoditional execution of something that was meant to be in a if clause. It could have been anything, even without goto, and have the same problem.
Second, this is not the kind of use of goto that Dijkstra warned about. This is perfectly fine, and widely used from the best programmers.
Please don't fall into the trap of believing that I am terribly dogmatical about [the go to statement]. I have the uncomfortable feeling that others are making a religion out of it, as if the conceptual problems of programming could be solved by a single trick, by a simple form of coding discipline!
E. W. Dijkstra, cited by Donald E. Knuth in Structured Programming with go to Statements, ACM Computing Surveys, 6 (4), 1974
Something that concerned me was things like software update daemons connecting with SSL. Can they be compromised? Or is this something that requires connecting to a black-hat server?
It could actually.
Exceptions and RAII are not resilient against typos and programmer error.
If the exception handler caught the wrong type, for example catching by value instead of reference, instead of having a catch(...) it would have the same problem as the above. It's harder to spot this sort of error than a double goto and incorrect parentheses - because you have to check the code that throws and the code that catches.
Admittedly in the scenario I lay out - what happens is an error is thrown and not caught, rather than a check being skipped, but the net effect is equivalent - and it takes 1 character out of place to invoke.
I don't see the possibility for catching the wrong type (or not catching it) if you get one character wrong. Unless of course you have two valid type names that just differ in one character, but that is a general pitfall and not exception specific.
If you always throw by value (and not by pointer) then it does not matter if you catch by value or by reference, the catch block will be activated either way. The only issue is possible slicing if you throw a derived instance and catch a base class type. You might lose error information that way, but the program will not crash due to an uncaught exception.
Well, since you're throwing an int-pointer then catch(int*) should also catch it quite happily.
I guess you mean there will only be a catch(int) in addition to catch(...) so in that aspect you're right of course.
Still, I would think taking the address of something you throw should ring quite a few warning bells as opposed to merely a missing ref (&) in the catch handler. Similarly to "return &e" which would also be suspicious and require an extra look or two.
No, you're worse off in C++, where the semantics of exceptions (especially in the area of resource allocation and deletion) are complex and hard to reason about.
I've done a bunch of kernel and other systems work in C and in C++, and my experience is that the C is a lot clearer. This type of code is all about not having magic side effects; everything needs to be in the open and very plain, or bugs start to get pretty subtle and hairy.
I'm not saying "Don't do systems programming in C++", because clearly you can. But it takes discipline to succeed, probably more discipline than you need to apply than if you're writing C.
Well it was mostly a tongue-in-cheek comment (was thinking of yesterday's C++ exception topic), but essentially true as you would not have used the "if + goto" with C++ error handling. Of course you can write the same bug (and more) in C++ too.
I don't know about exceptions (they are forbidden where I work) but any amount of C++ would significantly improve this implementation. There is code all over this library that allocates and deletes buffers in structs passed in as function arguments, i.e. other code's structs that nobody should be dicking around with except through constructors, destructors, and accessors.
This code would also be about half as long in straight-forward C++.
In what possible reality was this not the inevitable outcome? The "enough eyeballs" aphorism makes for a nice soundbite but in practice was always a red herring.
Clearly sarcasm. With open source, it is not a matter of whether these mistakes (or whatever this one was) happen or not, it is a matter of likelihood of these mistakes being spotted. It's a statistical thing. It's unfortunate this has to be explained.
The code is crap and while I can understand why a single person might have written it this way, I think any organization of full-time professional software developers should be collectively embarrassed to have accepted it. The only reason to use the "goto fail;" idiom is to free two buffers before returning. But the buffers in question are just sslBuffer structs that live on the local stack. Their destructors will be called, in reverse order, regardless of how the function is exited. If this code simply had a fucking destructor for sslBuffer to delete the data pointer, none of the rest of these call sites for SSLFreeBuffer would need to exist at all! And there's 24 such call sites in this file alone. Anyone with any sense would chose the 2-line destructor over dozens of calls to free buffers.
There's even this bullshit:
if ((err = SSLFreeBuffer(&hashCtx)) != 0)
goto fail;
Which is hilarious, because the only way for SSLFreeBuffer to fail is if the buffer doesn't exist, but hashCtx is a local temporary object and it MUST exist. Anyway the only thing we do by jumping to fail is to free the non-existent object once again.
Yes, one makes a conscious choice of implementation language. They aren't dictated by the laws of physics. Choosing your implementation language is a rather important step.
this pattern is structured to avoid the use of an additional stack (local) variable to track a condition, and to skip superfluous execution of code once that condition is detected.
i've used it myself to optimize inner loops of very simple un-accelerated graphics rendering code (which sped it up considerably, since it is potentially skipping many levels of unnecessary execution on the cpu, millions of times), but i agree with some posters here, using it in a security context like this is a bit daring.
still, having said that, this bug really is an indictment of the testing process and not really bad style per se.
The code already has that condition variable ('err'), the goto is to avoid execution of code that should not be executed. It's not about avoiding additional stack -- that was probably the last thing on the programmer's mind. The idiom is about error handling.
Lots of points in favor or against:
1) It's a huge compromise, and "open" to anyone to exploit, which would ultimately get caught and fixed faster. But it's also not targeting anything specific, so there's less of a signature of the attacker.
2) Incredibly simple, and thus a plausible mistake.
3) Hidden in plain sight
I'd generally come down on the side of "accident". The better question is if an systematic testing system for exploitable weaknesses (or a lucky chance finding) could find something like this (either a regression, or a badly implemented new feature) and exploit it -- basically assuming there are going to be errors in implementations. That's understood to be a standard technique in the IC, and a whole industry built around it... and then, the scale of compromise.
There are lots of mitigation techniques for vulnerabilities like this (essentially, devices which are too fast-changing and too difficult to fully trust, but which have to touch sensitive data), but it's not as if people can carry around a firewall in their pocket these days, sadly.
I'm certainly re-generating any keys which touched iOS or OSX devices (thankfully very few), and reinstalling OSes, in the interim.