Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
'Heartbleed' contributor denies he inserted it deliberately (smh.com.au)
57 points by jaboutboul on April 10, 2014 | hide | past | favorite | 83 comments


It we start naming and shaming the coder for each flaw instead of working on fixing the process that allowed it to sneak through, we'll see a chilling effect on open source software. There's a reason we have tests and code reviews and security audits...


I just happen to be speaking to the very reporter who wrote this article prior to him publishing this report. I told him that in 'our world' we don't really emphasize or make much of who introduced the bug, since it is understood that writing secure code is hard.

But what required the denial from the developer (and I feel horrible for him) is that this bug gained so much mainstream attention and speculation. It was only a matter of time before a reporter wanting to further the story got in touch with him.

He was in a bad situation - don't respond and have that misconstrued or respond and deny and give credence to the theory.

We all know that he didn't do it intentionally and in absence of any supporting evidence in all cases these are simple mistakes, but the general public who are currently whipped up in a frenzy around NSA revelations don't know that.


> We all know that he didn't do it intentionally

Actually, I hadn't known that. But this article gave me strong reasons to believe that it was, indeed, accidental. I certainly recognize that this is the sort of mistake that is extremely easy to make.


Unfortunately the tech industry seems to be in mob mode recently so I would think the chilling effect is already well underway. The fact that it has been suggested this was done intentionally before they know the whole story suggests this to me.

Never mind the fact that the code was apparently reviewed, so I guess the reviewer would have been in on it as well.

There are always people who prefer to play the blame game instead of actually working to solve the problem.


Part of me wants to see the team give the industry a big middle finger and quit the project, and now companies relying on it have to start spending money to keep it up instead of expecting it to be provided for free.

If something is important to you, you should spend more resources than zero on it.


Banks spend a lot of money on their login stuff. Some of those are really awful. Be careful what you wish for.


Does anyone know of a resource that shows who contributed and how?


That would be great. Maybe we could get some TDD people in there. Some competent c engineers, rather than code cowboys.


Because TDD is the "new" magic bullet, replacing all the magic bullets before it. Also, TDD works great retroactively on large open source projects with vast histories.


Maybe the silver lining here is that it puts the final nail in the coffin for "many eyes make all bugs shallow" - which was always total BS from the day it was uttered. There's so much code out there, much of it highly specialized and even project-specific, that there are very few eyes looking at any particular piece of code, and not all eyes are connected to the greatest of brains.

Most static code analyzers could have caught this particular bug, as it requires only a fairly simple kind of reasoning about allocated vs. used length. In fact I believe it probably was found by static analysis before being reported by a human, which is a shame because it misses an opportunity to highlight the value of such tools. Maybe next time people will spend a little less time designing a logo and a little more time doing things that actually help (though that's a wish and what I expect is the exact opposite).

The real lesson here is that we should apply as many different tools and processes as we can at improving code quality for critical infrastructure. Code reviews are nice. Static analysis is nice. Detailed tests are nice. However, none of these alone is sufficient even when pursued with fanatical devotion.


>> Maybe next time people will spend a little less time designing a logo and a little more time doing things that actually help (though that's a wish and what I expect is the exact opposite).

So you want the graphic designers contributing to crypto code?

As someone who is a mid level programmer but a decent/good designer, the only way I'm able to contribute to some projects is through things like logos.

Don't use this straw man argument to make your point, which is otherwise valid.


It sure would be nice if some of those designers would take a whack at the Mozilla certificate configuration UI. It hasn't meaningfully changed in a decade, it's one of the Internet's biggest security flaws, and it's something any UX person could safely propose changes to.


"It's one of the Internet's biggest security flaws" - aww, you're killing me here :)

We have looked at it. Repeatedly. First when porting Chrome to Linux (and trying to decide what sort of UI there should be for NSS cert management). Then for ChromeOS. And then again, revisiting it in light of the OpenSSL migration.

The reality is, the certificate management/configuration UIs, of all platforms (Windows, OS X, Linux-via-Firefox-and-Chrome-which-just-copies-Firefoxes-PSM), is hard. It's hard to express the decisions. It's hard to explain how the branching and cross-certifications of CAs can affect trust decisions (eg: recall DigiNotar's cross-certifications).

And in the end, it's not where users are spending their time/being tripped up. Getting the lock icon into a more meaningful state is a far, far greater investment, with far, far more actionable returns in users security. The work of the Chrome Security Enamel team on this - especially Adrienne Porter Felt - has been focused on trying to improve this.


I understand why you're writing this. You think I'm saying, "the UI that users use to modify the root certificate store is so bad that it's the Internet's worst problem".

I'm not saying that. I'm saying that the fact that there's only a UI to configure your root store, and not a UI to:

* set policy for which CAs you trust with which domains

* subscribe to policies published by trusted third policies

* monitor which sites bind you to which CAs

* easily opt in and out of trusting different CAs

... is the big problem. It's a UX problem, not a simple UI problem. There are additional features that need to be added, and that can be added, without litigating the whole CA system.


A lot of projects have no designers and the programmers spend a lot of time tweaking atrocious logos. I don't think that was meant to be a slight against designers. Probably the opposite, really: don't waste time screwing about with a logo if you don't have the talent, concentrate on the programming and either go without or find someone who can do it well.


> Maybe the silver lining here is that it puts the final nail in the coffin for "many eyes make all bugs shallow"

The OpenSSL project doesn't seem to have a 'review' step[0] in code commits, unlike FreeBSD, OpenBSD and Chromium (i'm just picking projects i'm familiar with, no bias).

So it might not have been 'many eyes', but the eyes of just one.

edit: [0] a proper review step, what the openssl project does is have feedback on commits (which is what the OP article is referring to). see below.


The "many eyes" canard isn't just about formal reviews, though. It's also about the idea that when something's open source the reviews aren't limited to maintainers or close colleagues who would be involved regardless of open vs. proprietary. What actually seems to happen in the great majority of cases is that the circle of people who actually understand any particular commit is only very slightly greater than it would be otherwise. The bugs are only a very tiny bit shallower, if at all. The fact that it was just one reviewer in this case, despite the code being widely available, only underscores that point.


The article says there was a reviewer who looked at this code before it was moved to the release branch.


It refers to 'review' as in reviewing the features and how it is implemented. Here is the log:

http://rt.openssl.org/Ticket/Display.html?id=2658&user=guest...

That is very far from it being a security review, or any real proper review. It is more a request for comments (which defaults to code being committed)

You'll notice that the workflow for commits doesn't have a review stage, and it isn't implemented in the workflow software.

Compare to a chromium example, where each commit, eg.

http://src.chromium.org/viewvc/chrome?view=revision&revision...

has a review stage that the original committer doesn't have permissions for. In this case 3 reviewers for that commit:

https://codereview.chromium.org/231923002

Code doesn't get committed without at least 4 people, its like a 2-man rule that is enforced in the workflow.

In contrast the openssl process is that the same committer can push the code after getting no objections. There is a big difference in how each of these processes work.


A lot of what Linus says only really applies to linux kernel development.

And some of that only applies when you're Linus.


> Most static code analyzers could have caught this particular bug

Does that mean nobody out there in the crypto world ran a static analyzer on OpenSSL in the last 2 years?


... And wasn't the Debian bug caused by someone running a tool against the code and trusting the tool?


Yeah, and I've heard that one guy died because he got tangled up in his climbing rope too. Those things are dangerous, nobody should use them.


OpenSSL's code uses particularly bad MACRO's ifdefs that defeat static analyzer's capabilities to analyze the code...


I'm not sure how much those macros affect anything here. For one thing, most static analyzers are actually very good at cutting through that kind of crap, running its analyses on code that has already been pre-processed using exactly the same command-line flags as the real compiler saw.

Even if that weren't the case, I think this particular bug would still qualify as low-hanging fruit. It doesn't involve a lot of macros. It doesn't involve dynamically assigned function pointers. It's not limited to one execution through multiple iterations of a complex loop. I'm painfully familiar with the constructs that can defeat static analysis, and none of them seem present in this case. The code allocates a buffer of size N, then reads from an offset that's not checked to be less than N. That's kind of Static Analysis 101.

My biggest worry here, TBH, is that static analysis was done, and this was flagged, but it was buried among so many other reports - many of them false negatives - that nobody paid any attention. That would be truly sad.


Meant to say "false positives" but it's too late to edit. IRTE


I don't know; this seems like a clear-cut case of "not enough eyes" to me.


I believe it. Crafting a weapon that can be used by both you and your enemies would be pyrrhic.

The lesson out of this ordeal is probably to be as skeptical as possible of everything you take for granted. How do you know that you're secure? What if your assumptions are wrong? Try to invent ways to break your own assumptions. The best way to protect yourself is to try to defeat yourself.

Unfortunately, "ain't nobody got time for that," as they say. But if you find time, it's quite rewarding. And disconcerting. You'll wonder why we're still wrestling with these fundamental problems in 2014, and then you'll start questioning the foundations we've been relying on until now.


> Crafting a weapon that can be used by both you and your enemies would be pyrrhic.

In the long run, yes. In the short run, only you know about it and can exploit it from day 0, others are presumably going to take some time to find it, and it may go unnoticed for years or never be found (I understand the error was only present in some versions of OpenSSL and both older and more recent versions were free from it). I'm inclined to think the author made a simple mistake but this does not seem like sufficient incentive against someone introducing such a vulnerability.


Strangely, nobody's tracking down the nginx developer who inserted the exact same bug into nginx just a few years back (a NUL in a header would cause the header copies, done using strncpy, to abort early and expose uninitialized memory). He must have been an NSA plant too, right?


Why aren't we doing exactly that? While its best not to attribute to malice what could easily be explained by human stupidity, shouldn't we try to rule out malice?


Yes! More witch hunts! That's what I was suggesting!

Seriously, this place is like Lord of the Flies sometimes.


I was recently moderated[1] by dang for attempting the same aggressive sarcasm you are now exhibiting. You might do well to heed the same advice I was given. With that in mind, please forgive any misinterpretation of your position you may have inferred.

now, perhaps you could answer the original question I was asking: shouldn't we rule out malice?

[1] https://news.ycombinator.com/item?id=7542802


Any C programmers who have never failed to check their bounds? Anyone? Anyone? Because everyone has done it.

Now would be a nice time for one of the Lint vendors to donate copies of their product to some of the OpenSSL team members, and for them to dedicate some resources into fixing the more important stuff it finds.


Or better yet, start seriously talking about moving to a language that doesn't make it so stupendously easy for trivial bugs like these to get so far out of hand.

Bounds checking incurs such a trivial amount of overhead, and the potential cost of straying out of bounds is so high. I understand we've got decades' worth of C culture built up around coding idioms that are incompatible with type safety and bounds checking. I understand that means that developing and migrating to a dialect of C or C++ that offers something in the way of a safety net is anything but trivial. But we've also got decades' worth of painful examples of just how catastrophic - and difficult to prevent - these kinds of errors can be, and that alone should justify the cost.

At the very least, consider the example set by C#: Be type safe and do bounds checks by default, but provide an `unsafe` keyword for those cases where it really is necessary to disable them.


I can't imagine what this guy must be feeling right now. I find it embarrassing enough when I am outed in my small team for producing a bug that makes it into production. To be known around the entire internet to have caused the largest security bug in recent times must be quite a slammer.

I really hope it doesn't affect his career..


The armchair crypto brigade aren't helping much, either, by passing around these completely unfounded accusations of malfeasance.

I'm all for being aware of the possibility that shenanigans are involved, but until proof comes out, it is nothing but a possibility, and a remote one at that.


"Never attribute to malice that which is adequately explained by stupidity." Obviously "stupidity" is way too strong here, but the principle still applies.


In this case I would replace "stupidity" with "assumption". It was code-reviewed so that level assumed correctness. It's in OpenSSL, which is written by experts, so the millions of us that use it every day assumed correctness. Our customers know little about the workings of the internet, so they assume correctness.

This, incidentally, is the same sort of thinking that sometimes kills experts in dangerous situations: the assumption that someone smarter than us must by definition know more than us, so we can be lackadaisical. Witness Snow Fall (http://www.nytimes.com/projects/2012/snow-fall/?forceredirec...)


The rest of the quote is "...but don't rule out malice."


I'm actually more worried that this will result in a civil lawsuit from one of the affected companies. In an ideal world[1], it would get thrown out immediately, but I'm a bit worried. That would have a heck of a chilling affect.

1) and a simple reading of the license


Is it 'rayiner who tells us to always have the disclaimer for liability in our licenses? Whoever it is, this is why.

I think any company that is damaged by this should get all their money back, all $0 of it.


My only worry is some nasty lawyer will figure out a way around the disclaimer. I am hoping not and that it has been well established, but the courts are often a bit insane.


He works for T-Systems, which belongs to Deutsche Telekom. His job is pretty safe, I would say. People working for Deutsche Telekom work under conditions similar to German civil servants in terms of terminability.


Let the programmer who has never written a bug cast the first stone.


I wouldn't be surprised if it was good for his career in the long run. Smart people will know that bugs like this can happen to anyone, and he'll have huge name recognition. As long as he's humble about it, "Yeah, I'm the guy who broke OpenSSL. Man, that sucked." I could see it being a big plus in interviews after a year or two.


I'd be amazed if he doesn't break from the pressure and/or abuse.


It may be impossible to distinguish genuine bugs from bug-backdoors, which is why it's important to start developing crypto in safer frameworks and languages. C considered harmful.


The problem is that GCs are considered harmful to crypto code, though that may be changing with Go. We won't know for about 5 years whether it's trustworthy, and even then, the side channel threat posed by GC may be worth worrying about.

Some big-name cryptographers have started implementing some useful crypto services in Go, so we'll see whether it catches on.

EDIT: Is the JVM generally trusted by cryptographers? I just realized that TextSecure is java code. Is the side channel threat posed by GC not too serious, then?


Rust provides memory safety without a GC, which seems more promising than anything else for this sort of application.


C vs GC is a false dichotomy.

There can be safer-than-C languages that don't use GC, e.g. you don't need GC for array bounds checking. You can use reference counting for more deterministic garbage collection.


Refcounting unfortunately fails for GC because it can't handle cyclic dependencies, e.g. object A references object B which references object A, now neither object will ever be freed even if nothing else references A or B.


Objective-C programmers have been using refcounting at scale for at least 2 decades.

No problems here.


I wouldn't say "no problems here" at all. Memory management errors are extremely common in Objective-C code. This is true even with ARC. The problems can be solved, but they are certainly there.

The question of cycles is even more pertinent with ARC, because it's easy to get used to letting the compiler do everything for you, and it's easy to miss an occasional leak of a cyclic object graph.


How is it possible to use refcounting for GC while also correctly handling cyclic references?

EDIT: Objective-C doesn't solve the problem at all. (More precisely, it requires manual intervention by the programmer.) See http://stackoverflow.com/questions/6260256/what-kind-of-leak...

From the article: "This occurs when one object has a strong pointer to another, but the target object has a strong pointer back to the original."

Some backrground info on Objective-C's memory management: http://stackoverflow.com/questions/7874342/what-is-the-diffe...

It's an interesting approach to insert release calls at compile time. Maybe it's worth not solving the cyclic reference problem in exchange for determinism. Thank you for pointing that out.


You don't need to automatically solve all the problems to get a safer language than C/C++.

I don't think cyclic object graphs are needed to, say, respond to a TLS heartbeat without out-of-bounds reads.


"The problem is that GCs are considered harmful to crypto code, though that may be changing with Go."

I'm curious what about Go's GC leads you to say that? Is it that when you get a new chunk of bytes, they've been pre-zeroed? That's not specifically a characteristic of the GC but it's my best guess. Otherwise I don't know what it would be, which is why I ask.


I don't know enough about it to say anything more than "I noticed some big-name cryptographers have begun writing crypto services in Go, so maybe it's safe." Trusting expert cryptographers to make safe decisions and then waiting five years for cryptanalysis to catch up is probably a safe course of action. (Moreso the "wait five years for cryptographers to poke holes in it" part.)

I remember some discussion on HN where someone mentioned that GCs pose a problem for secure code due to the non-deterministic nature of GC. There's apparently a nonzero chance of introducing a side-channel attack via GC. Until we're certain that chance is far closer to "zero" than "non-zero," we should either study that threat vector in detail or find a safer option.

It would be horrible to switch to something which is then proven to be broken in some fundamental and unfixable way, such as a GC side channel attack.


> The problem is that GCs are considered harmful to crypto code

Could a GC'ed language not just pause the GC thread while it executes a given block of code?


Compare with the attempted 2003 Linux kernel backdoor [1], which gave root when sys_wait4() is passed uncommon combination of flags. The bug looked inconspicuous and honest enough.

[1] http://lkml.iu.edu//hypermail/linux/kernel/0311.0/0635.html --


Yeah most of my bugs are deliberate. I like the feeling of making mistakes and seeing my boss run out of his office frantic.


Just wait till you have been told - it is running too fast so the client won't perceive it as a "serious" software ... slow it down.


The problem is not that bugs happen, but that certain categories of bugs still happen when we have means to avoid them altogether.


We lack perfection and we are not always programming at peak form even when are programming at near peak (personal) efficiency we are still not perfect. This bug even made it through a code review.

Makes me wonder how many serious bugs I have pushed to production in the last year or two. Yikes.


Does static code analysis or something like that would have caught the bug?


My priors on this being deliberate are pretty low, but I'd expect this denial either way so it really isn't evidence against the idea.


Hopefully one positive thing that will come out of this whole Heartbleed thing is that companies making extensive use of Open Source software for security critical purposes will consider contributing to ensure that security reviews are carried out on them.

The cost of regularly reviewing codebases like OpenSSL would likely not be that high when compared to the potential impact of a breach because of a flaw in the software.


I just can't understand why such critical components as OpenSSL just don't use Code Coverage tools like Coverity to find such things as this? Testing, coverage certification, static analysis: this would have been caught if these tools were being used.


It would shock me if those tools would actually have caught this bug. That would imply that no benevolent contributor or researcher had run them against OpenSSL in the last two years, which I find incredibly hard to believe, as static analysis is a logical first step in researching a code for vulnerabilities, and OpenSSL is probably in the top 5 most valuable pieces of software to validate.


Someone commented (I can't find it now) that post disclosure they ran 5 or 6 coverage tools and one of them flagged it.

They didn't say how much else got flagged, and how many false positives there were. It's very easy to retrospectively look through the list and say "oh yeah, there it is, amongst this big pile of nonsense".


I just ran Coverity on code that is similar to Heartbleed's bug, and it caught it. So, I think this is really just a glaring omission on a lot of peoples' part - but like you I also find it hard to believe nobody ran this code through static analysis. Its mind-boggling.


I don't know how Coverity works - is it possible that surrounding it with more complex code will cause it to miss the bug because it doesn't have enough information to not constantly raise false positives?


To put it briefly: no.

A little less briefly, then. ;) It is absolutely possible to create code that these tools in general, and Coverity in particular, will have difficulty analyzing . . . but you really have to work at it. Seriously, these guys are good. It's a bit like the "arms race" in building vs. breaking crypto. These guys have been there, they've seen all the moves, they know all the countermoves. Sure, if you load up your code with runtime-assigned function pointers and code that only executes if the last five iterations of a loop each went through specific code paths themselves, then that's going to cause some problems, but most programmers are unlikely to "win" that battle.

However, this particular bug looks like it's in the absolute easiest category. Any static analyzer should have caught it. As others have pointed out, the real problem is false positives. If it was caught, but the report was buried in hundreds or thousands of crappy reports about things that actually aren't problems, then it might as well not have been caught. That's why the pros at this spend as much time writing code to eliminate false positives as they do writing code to find new things. In every project I've worked on that used static analysis, the weak link in the chain has been between reporting and remedy, not in the analysis itself.


I'm not familiar with Coverity, but Sonar (for the Java world) is very similar in concept and will fail to detect things if there are enough issues before it in a file...

Which I would HOPE is not the case here.


From my experience with static analysis tools it's very easy to write perfectly valid code that the tool doesn't like, so it wouldn't surprise me too much.

Of course, the argument can often be made that if it isn't clear enough for the tool to find, it's not clear enough for a person to understand quickly.


So you refactor until the noise goes away. Why throw out the signal with the noise? Modern software engineering theory 101.


I absolutely agree it would be beneficial for static analysis tools to regularly be run on openssl.

My stance (which I made clearer elsewhere in the thread) is more along the lines of: if it's not being done by the core maintainers, but just by concerned third parties, it's very easy to lose the signal in noise you don't have the ability to refactor away (because of time, difficulty getting it merged upstream etc.)

So I'm not surprised that given the context it was missed by people running static analysis over it. That context is wrong, and it should have changed a long time ago, but under that context I can see it getting missed [1].

[1] By interpreting static analysis results, not necessarily by the code author and reviewer.


This thread here:

http://openssl.6102.n7.nabble.com/Coverity-coverage-of-OpenS...

mentions that one OpenSSL developer used to see defect reports from Coverity (probably through Coverity's scan project). He states:

"Coverity used to, and perhaps still do, run scans of OpenSSL, which we had (have?) access to. I used to look at them and fix relevant ones, but got irritated with the false positive level in the end.

If Coverity were interested in fixing their bugs, I might get interested in looking at their reports again."

Of course this doesn't demonstrate that Coverity found this particular problem, and since he doesn't state what the false positive rate was it's difficult to know how reasonable his statement is.


According to Andy Chou of Coverity on Twitter https://twitter.com/_achou/status/454287263917043712 - "We looked at the code and it turns out the bug is caused by something we don't currently detect." and then goes on to say "But we're going to look into it further. In cases like this it is not only about finding the defect it's also about not flagging hundreds/thousands of false positives at the same time"

So no, Coverity wouldn't find it.


Does someone know the actual line of source where the error appears? The goto fail vulnerability was easy to catch with static analysis, but I haven't seen the actual lines of code for this one yet.


The problem with most static analysis processes is the typically vast number of false positives.


I work with both private and public sectors in the DC area.

One of the things that sucks about the Federal sector is that they are dominated by Microsoft and Oracle shills (the kind of IT pros who can't learn new skills unless its spoon fed pre-digested in the form of industry certification training) who do nothing but scream about the danger of open source. Now of course we all know that the only difference between enterprise and open source security holes is that the former go undiscovered for longer, and when discovered by the code owners aren't disclosed despite them having knowledge that black-hats know about it.....

But make no mistake. This fucking idiot [Edit: Ok, this isn't fair, this could happen to anyone, so he's not an idiot, but why not use a static analysis tool?] who did this to OpenSSL and the idiots who let it happen are going to set open source in the Federal government back YEARS. Not because its an actual threat, but because it will be used by the enterprise assholes as a weapon to keep selling their shitware to the risk averse morons who make up the giant pile of middle manager idiots that composes the Federal government.

I've already told my boss I'm not doing any more public sector work after my current project ends, and this is the nail in the coffin for me touching it ever again.


So your theory is that nobody in the last couple years ever ran Coverity or Fortify on one of the top 5 security codebases on the whole Internet?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: