

The Heartbleed Aftermath: all CloudFlare certificates revoked and reissued - jgrahamc
http://blog.cloudflare.com/the-heartbleed-aftermath-all-cloudflare-certificates-revoked-and-reissued

======
3JPLW
Really great writeup. Here's the most astounding part to me: "[Rubin Xu]
decided to use Coppersmith's Attack[1] to derive the private exponent. In this
attack, only pieces of the private key are needed and the private key is
reconstructed mathematically. This attack is so efficient that he managed to
extract the private key from only the first 50 dumps!"

That's no more than ~3MB of data. Some of the other winners hit the server
with hundreds of thousands of requests which might have gotten noticed. Doing
this with only 50 heartbeats (which could be spread across many days) makes
the "undetectable attack" even less detectable.

[1]. Coppersmith's Attack:
[http://en.wikipedia.org/wiki/Coppersmith's_Attack](http://en.wikipedia.org/wiki/Coppersmith's_Attack)

~~~
vizzah
Would love to see more details on the practical implementation of this attack
/ source code. Whilst detecting primes in random data is feasible, can't
imagine how chunks of returned data were "unrandomized" for this type of
attack.

~~~
tptacek
Coppersmith is complex enough that I'd be impressed to see working
implementations even in lab conditions; it's one of the lattice reduction
crypto attacks.

~~~
pbsd
It's really not that bad with today's computer algebra packages. Here's a
simple example in Sage that works most of the time:

    
    
        p = random_prime(2**512)
        q = random_prime(2**512)
        n = p*q
    
        P.<x> = PolynomialRing(Zmod(n)) 
    
        partialp = p - p % 2**224 # most significant 288 bits
        f = x + partialp
        root = f.small_roots(X=2**(224), beta=0.5, epsilon=2**-5) 
        assert(root[0] + partialp == p)
    
        partialp = p % 2**288 # least significant 288 bits
        f = x + partialp * inverse_mod(2**288, n)
        root = f.small_roots(X=2**(224), beta=0.5, epsilon=2**-5) 
        assert( root[0]*2**288 + partialp == p )

~~~
vizzah
Cool. Would this approach work if 'partialp' is not coming from the starting
or ending bits of the targeted prime number?

~~~
pbsd
Yes. But it makes for a worse didactic example, since Sage does not support it
out of the box.

Imagine you know the middle 288 bits of p. Then you have to find small roots
of the equation

    
    
        x1*2**400 + partialp + x0,
    

which now has 2 variables x0, x1 < 2^112. This requires building a different
lattice than in the single variable case, but the complexity of the attack is
not very different.

Also note that it doesn't have to be p or q. Bits of d, d mod (p-1), and d mod
(q-1), and possibly other intermediate values of RSA computations are also
usable for this.

------
dfc
I love the reaction from ISC's handler on duty. Globalsign's CRLs were
temporarily removed from the dataset until they could verify it was "real":

    
    
      > Update: We temporarily stopped reading
      > “http://crl.globalsign.com/gs/gsorganizationvalg2.crl” . It had > 30,000
      > revocations today, drowning the rest of the data. Trying to figure out if
      > this is real. (4/16/2014 7:30pm)x
      >
      > Update 2: I used the online chat to talk to GlobalSign tech support and
      > they confirmed that it is real and related to the revocations due to
      > Heartbleed. Adding them to the graph again. (4/16 8pm ET)

~~~
filmgirlcw
Yes! I laughed at that too. Though it makes sense that a mass-increase in data
like that would seem off, especially when it is so much higher.

------
filmgirlcw
As much shit as CloudFlare took for the way they humble-bragged about finding
out about Heartbleed early, I have to give kudos to the team for doing the
right thing and reissuing and revoking all the certs, as well as detailing the
exploit and offering up the test site.

I'll admit, when the CloudFlare Challenge launched, the cynical part of me saw
it as merely a way for the company to convince itself that it didn't need to
reissue and revoke all of its certs (and I maintain that is exactly what it
was), but the way the company handled things after the challenge was won is
what really impressed me.

They didn't have to be that transparent. Moreover, the test server has
provided a way for Filippo Valsorda to improve his excellent Heartbleed
checker, so well played.

~~~
tptacek
I'm a little irritated by the challenge not because of any delay in rotating
certs, but because its whole premise was invalid; the challenge was
accompanied by a long blog post that presumed a model for how RSA keys hit
heap memory that appears to have been self-evidently false (again: I might be
wrong here).

Or, to put it differently: Akamai, which attempted to write a secure heap to
contain memory corruption flaws in OpenSSL, conceded immediately Willem
Pinckaers observation that they weren't protecting the CRT intermediates.
Akamai seemed to believe ahead of time that keys weren't simply loaded once
and held.

There are lots of smart people working at Cloudflare, but I think the
"challenge site" approach was a waste of time. The stuff John Graham-Cumming
posted today, showing bitmap diagrams of the memory space of OpenSSL
processes, would have been much more productive as a starting point. Diagrams
like that are a standard part of reverse-engineering practice. They should
have started there, and not with the challenge site.

We all make mistakes (some of mine are easy to find). I'd just say that a
lesson to learn from this is, "err on the side of greater severity, not
lesser."

(Is what I think.)

~~~
jgrahamc
There's one significant difference I think between what Akamai did and what we
(CloudFlare) did.

Akamai thought their memory allocator meant they were safe and so didn't
change their SSL keys; we weren't convinced by our own analysis (which I was
involved in) launched the challenge, and simultaneously started the process of
revoking keys.

In the challenge blog post we said: "We’ve begun the process of reissuing and
revoking the keys CloudFlare manages on behalf of our customers. In order to
ensure that we don’t overburden the certificate authority resources, we are
staging this process. We expect that it will be complete by early next week."

That wasn't BS, we were doing that and we had a _ton_ of keys to deal with.
Today's blog post is a result of further analysis and the announcement that we
revoked everything.

BTW I'm not done with analysis of OpenSSL memory. There's still something
weird going on (I think in the bignum library) where data isn't getting
OPENSSL_cleanse'd from memory.

~~~
tptacek
I am "a little bit" not a fan of waiting to rekey. A lot of people did that.

I am "a lot" not a fan of setting up a challenge game to have people try to
pull the key with one hand figuratively tied behind their back, when a
debugger or memory dump would have decisively answered the question --- and,
presumably, did exactly that for at least some of the winners.

I am "severely" not a fan of _marketing_ skepticism about the severity of a
flaw. This is almost vulnerability response 101: it's the reason Cisco and
Microsoft tag memory corruption flaws "possible remote code execution" even
when they're obviously not exploitable. Err on the side of severity.

As evidence for my concern, I'll again observe that Bruce Schneier's read of
the Crowdflare Challenge blog post was that your team believed it to be "next
to impossible" to extract RSA keys.

As you know, very few people in your audience have the technical background to
evaluate that claim, especially when it's accompanied by heap metadata
diagrams.

I don't think any of this is earth-shattering or even all that controversial.
I would just hate to see a standard response to complicated vulnerabilities
become "set up a challenge site". Challenges are great! You guys should do
more of them! Just don't do them to answer critical security questions. :)

 _Edited to sound less petulant._

~~~
eastdakota
Worth noting that the cost imposed on the Internet's infrastructure of mass
revocation is non-trivial:

[http://blog.cloudflare.com/the-hard-costs-of-
heartbleed](http://blog.cloudflare.com/the-hard-costs-of-heartbleed)

Just CloudFlare revoking all the certs on our network added 40Gbps of net new
traffic to one CA. Mass revocation without making sure all systems were ready
to handle it would have also been irresponsible. Props to Globalsign (our
primary CA partner) for working with us to make sure we could pull it off.

~~~
tptacek
That is a totally understandable reason for why it took many hours or days to
revoke keys. I'm not really faulting you for that. My issue here is different,
more subtle, and nerdier.

