
Ron was Wrong, Whit is Right - sew
http://eprint.iacr.org/2012/064.pdf
======
jalapeno_dude
Posting what I said in the other thread: As far as I can tell, this is what
they're doing: if two different keys have a factor in common (i.e. A=PQ,
B=PR), then you can use Euclid's algorithm (which just requires repeated
subtraction, and is thus really easy) to find P (=gcd(A,B)), and then just use
division to find Q (=A/P) and R (=B/P) easily. So what the researchers did,
apparently, was to gather all the RSA public keys they could find (6 million
or so) and then calculate the gcds of all pairs of keys. (Presumably they were
slightly more clever than this, but it doesn't really matter--doing this is
still vastly easier than factoring even a single public key). Whenever they
found a gcd that wasn't equal to 1, they'd cracked (at least) 2 keys. And it
turns out that roughly 0.2% of the keys could be cracked in this way (for a
total of 12720 1024-bit moduli). As far as I can tell, the problem isn't with
RSA itself, but with whatever algorithms are used to pick two random primes to
multiply and produce the key (or moduli, whatever)--they don't have high
enough entropy and output duplicates ~0.2% of the time.

~~~
netdog
_So what the researchers did, apparently, was to gather all the RSA public
keys they could find (6 million or so) and then calculate the gcds of all
pairs of keys._

Apparently not. The number of unique pairings among 6 million is 5,999,999
factorial. Which is a _really_ large number.

~~~
neolefty
Isn't it 6 million squared? 36 x 10^12. Factorial would be the number of
ordered sets that include _all_ the elements.

~~~
wging
At the risk of redundancy (I wrote my post while you posted): you're double
counting (since gcd(a,b) = gcd(b,a), you needn't compute both; either tells
you whether a,b have a common factor. (a,b) and (b,a) having a common factor
both account for the same problem) and you're including pairs of the same
element (gcd(a,a)=a always, but this is meaningless for the data set we're
considering--we don't care that a number has common factors with itself, only
with other numbers)

------
grandpa
In case it's not obvious to everyone: Ron = Ron Rivest (the R in RSA); Whit =
Whitfield Diffie (of Diffie-Hellman).

~~~
locacorten
What a bad choice for a paper title.

~~~
scott_s
I don't agree. It's short and gets across the main point to the right people.
If you don't know the names being referred to, you're probably not the
audience for the paper.

~~~
cperciva
I am the audience for this paper, and I recognized the names.

I still had absolutely no clue what the title meant until grandpa pointed it
out. I agree with locacorten: Horrible title.

~~~
tptacek
You can't see "Whit" and immediately know "it's Whit Diffie", or given Whit
Diffie expect that "Ron" means "Ron Rivest"? I got it instantly and have way
less crypto than you.

Or are you saying, "Ron Rivest was wrong, Whitfield Diffie was right" is still
a horrible title?

The whole paper had the feel of a rump session pub, didn't it?

~~~
cperciva
_Or are you saying, "Ron Rivest was wrong, Whitfield Diffie was right" is
still a horrible title?_

Yes. That title made me think "ok, what famous argument dud they have which
this is referring to?" (the way "Torvalds was wrong, Tanenbaum was right"
would) and I can't think of any famous arguments about whether people have
enough entropy in their prime generation.

~~~
tptacek
To an implementor, isn't one of the biggest differences between DH (super easy
to implement) and RSA (considerably trickier) the difference between needing
one prime number instead of two?

Anyways I take your point. The paper really doesn't have too much to do with
the title.

~~~
cperciva
To me, the biggest difference between DH and RSA is that DH is stateful and
RSA is stateless. Next up is that with DH you're generating random exponents
each time instead of doing it once during key generation.

The number of primes involved is _way_ down on my list of important
differences between DH and RSA.

------
tlb
Should certificate authorities be checking that any new key they sign doesn't
have a common factor in common with any other key they've signed?

It would have discovered a few problems much earlier, but it conceivably leaks
information when your CSR gets rejected.

~~~
DavidSJ
It would only "leak" information which is already public.

------
tptacek
This paper has as much social science commentary as math. Things I found
interesting or amusing:

pg2 (last graf into pg3)

 _Publication of results undermining the security of live keys is uncommon and
inappropriate, unless all aff ected parties have been noti fied. In the
present case, observing the above phenomena on lab-generated test data would
not be convincing and would not work either: tens of millions of thus
generated RSA moduli turned out to behave as expected based on the above
assumption. Therefore limited to live data, our intention was to con firm the
assumption, expecting at worst a very small number of counterexamples and aff
ected owners to be notifi ed. The quagmire of vulnerabilities that we waded
into, makes it infeasible to properly inform everyone involved..._

"Of course when we generate keys ourselves they work fine. We have no idea why
there are so many collisions in the real world, and there are so many
plausible suspects we don't even know where to start notifying people.

pg3 (last graf)

 _As was the case with the Debian OpenSSL vulnerability, we believe that
publication is preferable to attempts to cover up weaknesses. There is a risk,
but people desiring to exploit the current situation will have to redo our
work, both the data collection and the calculation, as we have put our data
under custody. We hope that people who know how to do the computation quickly
will show good judgment._

Translated: "On the scale of crypto attacks this is so easy to do that just
talking about it discloses the attack; please don't be dicks about that."

pg 5

 _... Except for eight times e = 1 and two even e-values among the PGP RSA
keys ..._

!!!

 _... Two e-values were found that, due to their size and random appearance,
may correspond to a short secret exponent (we have not investigated this)._

Good reference: Boneh at <http://www.ams.org/notices/199902/boneh.pdf> \---
under "Elementary attacks"

pg 6

 _Looking at the owners with shared n-values among the relevant set of 266 729
X.509 certi cates, many of the duplications are re-certi cations or other
types of unsuspicious recycling of the same key material by its supposedly
legal owner. It also becomes clear that any single owner may come in many di
fferent guises. On the other hand, there are also many instances where an
n-value is shared among seemingly unrelated owners. Distinguishing
intentionally shared keys from other duplications (which are prone to fraud)
is not straightforward, and is not facilitated by the volume of data we are
dealing with (as 266 729 cases have to be considered). We leave it as a
subject for further investigation into this "fuzzy" recognition problem to
come up with good insights, useful information, and reliable counts._

Translation: "This is not the only WTF we found in the survey."

pg 7

 _Although 512-bit and 768-bit RSA moduli were factored in 1999 and 2009,
respectively, 1.6% of the n-values have 512 bits (with 0.01% of size 384 and
smallest size 374 occurring once) and 0.8% of size 768. Those moduli are weak,
but still o ffer marginal security. A large number of the 512-bit ones were
certifi ed after the year 2000 and even until a few years ago. With 73.9% the
most common size is 1024 bits, followed by 2048 bits with 21.7%. Sizes 3072,
4096, and 8192 contribute 0.04%, 1.5%, and 0.01%, respectively. The largest
size is 16384 bits, of which there are 181 (0.003%)._

Most certs have 1024 bit keys, but there are still semirecent 512 bit certs
out there.

pg 8

 _Generation of a regular RSA modulus consists of finding two random prime
numbers. This must be done in such a way that these primes were not selected
by anyone else before. The probability not to regenerate a prime is
commensurate with the security level if NIST's recommendation is followed to
use a random seed of bit-length twice the intended security level. Clearly,
this recommendation is not always followed._

The nut graf of the paper, right?

 _footnote:_

 _Much larger datasets can be handled without trouble. We chose not to
describe the details of our calculation or how well it scales_ (sapienti sat).
_On a 1.8GHz i7 processor the straightforward approach would require about ten
core-years and would not scale well. Inclusion of the p and q-values from
Section 4 and the primes from related to the Debian OpenSSL vulnerability did
not produce additional results._

Sapienti sat: "a word to the wise", which I take to mean, if you understand
basic RSA attacks, this stuff is pretty straightforward. Notable: the Debian
OpenSSL CSPRNG bug didn't seem to contribute to this finding.

pg 9

 _Irrespective of the way primes are selected (additive/sieving methods or
methods using fresh random bits for each attempted prime selection), a variety
of obvious scenarios is conceivable where poor initial seeding may lead to
mishaps, with duplicate keys a consequence if no "fresh" local entropy is used
at all. If the latter is used, the outcome may be worse: for instance, a not-
properly-random fi rst choice may be prime right away (the probability that
this happens is inversely proportional to the length of the prime, and thus
non-negligible) and miss its chance to profi t from local entropy to become
unique. But local entropy may lead to a second prime that is unique, and thus
a vulnerable modulus._

To me the most interesting graf: here's a story about how generating a random
prime is tricky _even when you have a reliable source of random bits._ You
write a prime generator that taps entropy only after each guess. If your first
guess is prime, you never tap entropy at all; you generate "correct" but
insecure results.

Then, yadda dadda I skipped all the stuff about DSA and ElGamal except:

 _The only interesting fact we can report about ECDSA is the surprisingly
small number of certi cates encountered that contain an ECDSA key (namely,
just one), and the small number of certi cates signed by ECDSA (one self-
signed and a handful of RSA keys)._

Yay for progress!

pg 13

 _The lack of sophistication of our methods and fi ndings make it hard for us
to believe that what we have presented is new, in particular to agencies and
parties that are known for their curiosity in such matters. It may shed new
light on NIST's 1991 decision to adopt DSA as digital signature standard as
opposed to RSA, back then a "public controversy"; but note the well-known
nonce-randomness concerns for ElGamal and (EC)DSA and what happens if the
nonce is not properly used._

OH SNAP, NSA. This is so easy even YOU could figure it out.

Regarding nonce-randomness: this is what happened to Sony; if nonces to DSA
are predictable, the whole algorithm falls.

~~~
caf
I love this one:

 _The remaining connected component is the most intriguing - or suspicious -
as it is a complete graph on nine vertices (K9): nine primes, each of whose
9C2 = 36 pairwise products occurs as n-value._

There's an RSA key generator out there which randomly picks two primes from a
set of 9.

~~~
wging
I'd love to know if the author of that generator knew that that's what (s)he
was doing.

That is, I like the idea of someone trying to be a bit too clever and
accidentally borking their code due to their unfamiliarity with math rather
more than I like the idea of that kind of negligence.

Of course, who's to say that a single generator isn't responsible for more
than one connected component of the graph...?

------
nohat
Somewhat disappointing, but not surprising that they did not describe their
actual calculation technique for finding the connected primes. It strikes me
that there isn't a good excuse for having bad random numbers with the
computation power we now have. A simple bitwise addition of various sources of
supposedly random numbers is strictly stronger than any subset of its
components. So we can combine lots of questionable strong sources and get a
very strong source.

Generating the primes locally with insufficient independence seems more
understandable than that the same primes are used by independent parties. That
hints at a more severe problem with random number generation. Regardless,
assuming this is confirmed, it is a significant problem that needs to be
hunted down.

~~~
a1k0n
It was mentioned in the nytimes article, and in retrospect it's fairly
obvious: you use the Euclidean algorithm.

If you have _n1_ = _p q1_ and _n2_ = _p q2_ , so that the prime _p_ is shared
between them, then gcd( _n1_ , _n2_ ) = _p_. That only takes a few iterations
of bignum modulos to solve.

~~~
nohat
Ah, I took the 'the straightforward approach would require about ten core-
years and would not scale well' to mean they had a more efficient method.

~~~
a1k0n
Hm, that's possible. A comprehensive search over 4 million pairs would be O(16
trillion) operations, so perhaps they gain something by reusing intermediate
steps of the gcd, or something? Good point, I'm not sure. The brute-force
solution is still not out of the question for a modestly large map-reduce
cluster.

~~~
notaddicted
Yeah, regarding going ahead with the "naive" search, I'm coming up with a
price of 5k to 15k on Amazon EC2 using the cluster compute 8XL instances, and
under a month using 10 machines.

~~~
nohat
Considering the potential value of some of those rsa keys, that is a quite
minimal investment. This might be an in the wild danger pretty quickly.

~~~
tptacek
The chances of any one key being compromised are very, very low.

If this is a pattern of simple software flaws, or something environmental like
keys generating during cold start entropy starvation, the likelihood of
pulling a truly valuable key out of that soup are even lower: generally (not
always, but generally) the more valuable keys are generated using very well
known software, and under tight process constraints.

So basically you get ~10k random certs that allow you to MITM ~10k random
sites. It's not nothing, but criminals have better ways of monetizing attacker
effort.

~~~
nohat
A lot does depend on the actual circumstances for the bad RNG. If you had a
good method of recognizing the valuable sites (anything involving money or
maybe email?) it might be more practical considering a botnet would be just
fine for this sort of computation.

------
marshray
As if it's somehow Ron Rivest's fault that people can't generate random
numbers?

~~~
tptacek
"Yes, because his algorithm requires you to get two secure ones, and other
public key algorithms don't."

Would be the authors' response to that.

~~~
marshray
One could write an equivalent article suggesting the need to avoid weak groups
for Diffie-Hellman means that Rivest is right.

~~~
tptacek
The weak group problem is addressable by a NIST standard in a way that the
two-relatively-secure-primes problem isn't. "Here, use exactly these values",
instead of "use code that does X, not Y".

------
mindslight
Utterly simple and completely obvious in retrospect -> game changing.

Maybe it's a good idea to tag a public key with the software/version used to
create it. Debian fiasco keys are easy to pick out - the entropy was so low
that the bad keys can be enumerated. This new attack relies on any small
weakness in the distribution of primes, and it seems quite hard to track down
responsible versions of software.

I wonder how many of these shared factors _aren't_ the actual value of p or q
chosen at generation time (primality tests are probabilistic).

~~~
marshray
> Maybe it's a good idea to tag a public key with the software/version used to
> create it.

I'm a full-disclosure guy at heart, but why would I want to advertise to the
attacker my potential weaknesses?

Wouldn't I find my own personal selfish security is maximized when I falsify
that field when generating my keys?

~~~
djcapelis
> Wouldn't I find my own personal selfish security is maximized when I falsify
> that field when generating my keys?

It depends if you're the type of user who would like to rely on hearing
themselves that their keys are vulnerable because of the software used or if
you're the type of user who would like to be informed by an automated script
that their keys are vulnerable.

I am guessing there are many in the latter category and a few in the former
category but at the same time most of the people who decide these things will
live in the former.

As for you, marshray, personally, I'm going to go with yes. But does that
really make things worse for those who opt in to the potential for
notifications? I think you can make your security tradeoff and others can make
theirs and both categories of users will end up happy.

