Hacker News new | comments | show | ask | jobs | submit login
Ron was Wrong, Whit is Right (iacr.org)
122 points by sew on Feb 14, 2012 | hide | past | web | favorite | 59 comments

Posting what I said in the other thread: As far as I can tell, this is what they're doing: if two different keys have a factor in common (i.e. A=PQ, B=PR), then you can use Euclid's algorithm (which just requires repeated subtraction, and is thus really easy) to find P (=gcd(A,B)), and then just use division to find Q (=A/P) and R (=B/P) easily. So what the researchers did, apparently, was to gather all the RSA public keys they could find (6 million or so) and then calculate the gcds of all pairs of keys. (Presumably they were slightly more clever than this, but it doesn't really matter--doing this is still vastly easier than factoring even a single public key). Whenever they found a gcd that wasn't equal to 1, they'd cracked (at least) 2 keys. And it turns out that roughly 0.2% of the keys could be cracked in this way (for a total of 12720 1024-bit moduli). As far as I can tell, the problem isn't with RSA itself, but with whatever algorithms are used to pick two random primes to multiply and produce the key (or moduli, whatever)--they don't have high enough entropy and output duplicates ~0.2% of the time.

As far as I can tell, the problem isn't with RSA itself, but with whatever algorithms are used to pick two random primes to multiply and produce the key (or moduli, whatever)...


they don't have high enough entropy and output duplicates ~0.2% of the time.

Almost correct: They don't have enough entropy at least 0.2% of the time. These results are consistent with all primes being generated with ~40 bits of entropy -- which would imply that all the keys are broken. Obviously we hope that this is not the case and instead it's just a small subset of primes which were generated with low entropy, but the data do not provide any evidence in either direction.

They do write:

the graph of the moduli contained in c distinct certificates consists, in an ideal world, of c disjoint connected components each consisting of two vertices joined by a single edge, for a total of 2c vertices and c edges ... tens of millions of [lab-]generated RSA moduli turned out to behave as expected based on the above assumption.

That implies that there is at least one (presumably common) method for generating RSA keys which does not frequently output duplicate primes.

Security requires more than "does not frequently output duplicate primes". It requires "primes have a large amount of entropy".

So what the researchers did, apparently, was to gather all the RSA public keys they could find (6 million or so) and then calculate the gcds of all pairs of keys.

Apparently not. The number of unique pairings among 6 million is 5,999,999 factorial. Which is a _really_ large number.

Not so. Given a set of N elements from which you choose k (with order not mattering, since you only care whether a given two are the same) the number of pairs is N!/(k!(N-k)!). So it's (6000000!)/(2!(6000000-2)!).

We can take advantage of the cancellation properties of the factorial to write this as N\(N-1)\...\(N-k+1)/k! . If k=2 this resolves to N(N-1)/2.

You could also derive it like this: for ordered* pairs, you have N choices for the first and N-1 for the second (since you don't pair anything with itself but all other choices are permitted). This counts each pair exactly twice: each pair and its reverse are the same if you don't care about order. So N(N-1)/2.

Still trillions of combinations. Not 5999999! though.

Isn't it 6 million squared? 36 x 10^12. Factorial would be the number of ordered sets that include all the elements.

At the risk of redundancy (I wrote my post while you posted): you're double counting (since gcd(a,b) = gcd(b,a), you needn't compute both; either tells you whether a,b have a common factor. (a,b) and (b,a) having a common factor both account for the same problem) and you're including pairs of the same element (gcd(a,a)=a always, but this is meaningless for the data set we're considering--we don't care that a number has common factors with itself, only with other numbers)

In case it's not obvious to everyone: Ron = Ron Rivest (the R in RSA); Whit = Whitfield Diffie (of Diffie-Hellman).

What a bad choice for a paper title.

I don't agree. It's short and gets across the main point to the right people. If you don't know the names being referred to, you're probably not the audience for the paper.

I am the audience for this paper, and I recognized the names.

I still had absolutely no clue what the title meant until grandpa pointed it out. I agree with locacorten: Horrible title.

The title of a paper doesn't need to tell you exactly what the paper has in it. (It can't, except in those cases where the content of the paper is perfectly expressible in a single snappy sentence.)

It does need to (1) give a clue whether you want to read it and (2) let you recall what the paper is about when you see the title on a later occasion.

In this instance, #1 goes like this: "Oh, it's by Lenstra. And it's got a cutesy title. It's got to be worth a look, if only to see what on earth the title means." And #2 ... well, do you think any interested party who read this paper is going to have any difficulty remembering what it's about, say, 10 years from now? I don't.

You can't get away with this sort of thing if you aren't as big a name as Lenstra -- or if your paper isn't about something vitally important that (in hindsight) is easily associated with the paper title. But that's OK. Most of Lenstra's papers have titles like "Fast Irreducibility and Subgroup Membership Testing in XTR" and "Factoring Multivariate Polynomials over Algebraic Number Fields".

You can't see "Whit" and immediately know "it's Whit Diffie", or given Whit Diffie expect that "Ron" means "Ron Rivest"? I got it instantly and have way less crypto than you.

Or are you saying, "Ron Rivest was wrong, Whitfield Diffie was right" is still a horrible title?

The whole paper had the feel of a rump session pub, didn't it?

Or are you saying, "Ron Rivest was wrong, Whitfield Diffie was right" is still a horrible title?

Yes. That title made me think "ok, what famous argument dud they have which this is referring to?" (the way "Torvalds was wrong, Tanenbaum was right" would) and I can't think of any famous arguments about whether people have enough entropy in their prime generation.

To an implementor, isn't one of the biggest differences between DH (super easy to implement) and RSA (considerably trickier) the difference between needing one prime number instead of two?

Anyways I take your point. The paper really doesn't have too much to do with the title.

To me, the biggest difference between DH and RSA is that DH is stateful and RSA is stateless. Next up is that with DH you're generating random exponents each time instead of doing it once during key generation.

The number of primes involved is way down on my list of important differences between DH and RSA.

Stephen was wrong, Kip was right! :)

The whole title is horrible because of its not even related to the outcome of the experiment/paper. Ron and Whit are neither both right nor both wrong, nothing in the article even came close to proving such a title. Using bad entropy is wrong.

What would be nice to see is the age of the affected certificates graphed in a scatter. It would be interesting to note if there are patterns of grouping. We all read the articles from Peter Goodman et al where he demonstrates horrible entropy [0,0,0,0,0,0,0,0].

I disagreed with your first point downthread.

Arjen Lenstra gets to pick whatever titles he likes.

Of course, but that doesn't make it beyond criticism!

I personally liked it and am somewhat surprised that cperciva didn't. That alone made reading this conversation interesting as far as I'm concerned. And is going to make me think harder about opting for titles on my papers that opt for Whit over content.

I agree. The math and theory behind RSA is correct, it's only certain implementations on the web that are lacking.

Should certificate authorities be checking that any new key they sign doesn't have a common factor in common with any other key they've signed?

It would have discovered a few problems much earlier, but it conceivably leaks information when your CSR gets rejected.

It would only "leak" information which is already public.

This paper has as much social science commentary as math. Things I found interesting or amusing:

pg2 (last graf into pg3)

Publication of results undermining the security of live keys is uncommon and inappropriate, unless all aff ected parties have been noti fied. In the present case, observing the above phenomena on lab-generated test data would not be convincing and would not work either: tens of millions of thus generated RSA moduli turned out to behave as expected based on the above assumption. Therefore limited to live data, our intention was to con firm the assumption, expecting at worst a very small number of counterexamples and aff ected owners to be notifi ed. The quagmire of vulnerabilities that we waded into, makes it infeasible to properly inform everyone involved...

"Of course when we generate keys ourselves they work fine. We have no idea why there are so many collisions in the real world, and there are so many plausible suspects we don't even know where to start notifying people.

pg3 (last graf)

As was the case with the Debian OpenSSL vulnerability, we believe that publication is preferable to attempts to cover up weaknesses. There is a risk, but people desiring to exploit the current situation will have to redo our work, both the data collection and the calculation, as we have put our data under custody. We hope that people who know how to do the computation quickly will show good judgment.

Translated: "On the scale of crypto attacks this is so easy to do that just talking about it discloses the attack; please don't be dicks about that."

pg 5

... Except for eight times e = 1 and two even e-values among the PGP RSA keys ...


... Two e-values were found that, due to their size and random appearance, may correspond to a short secret exponent (we have not investigated this).

Good reference: Boneh at http://www.ams.org/notices/199902/boneh.pdf --- under "Elementary attacks"

pg 6

Looking at the owners with shared n-values among the relevant set of 266 729 X.509 certi cates, many of the duplications are re-certi cations or other types of unsuspicious recycling of the same key material by its supposedly legal owner. It also becomes clear that any single owner may come in many di fferent guises. On the other hand, there are also many instances where an n-value is shared among seemingly unrelated owners. Distinguishing intentionally shared keys from other duplications (which are prone to fraud) is not straightforward, and is not facilitated by the volume of data we are dealing with (as 266 729 cases have to be considered). We leave it as a subject for further investigation into this "fuzzy" recognition problem to come up with good insights, useful information, and reliable counts.

Translation: "This is not the only WTF we found in the survey."

pg 7

Although 512-bit and 768-bit RSA moduli were factored in 1999 and 2009, respectively, 1.6% of the n-values have 512 bits (with 0.01% of size 384 and smallest size 374 occurring once) and 0.8% of size 768. Those moduli are weak, but still o ffer marginal security. A large number of the 512-bit ones were certifi ed after the year 2000 and even until a few years ago. With 73.9% the most common size is 1024 bits, followed by 2048 bits with 21.7%. Sizes 3072, 4096, and 8192 contribute 0.04%, 1.5%, and 0.01%, respectively. The largest size is 16384 bits, of which there are 181 (0.003%).

Most certs have 1024 bit keys, but there are still semirecent 512 bit certs out there.

pg 8

Generation of a regular RSA modulus consists of finding two random prime numbers. This must be done in such a way that these primes were not selected by anyone else before. The probability not to regenerate a prime is commensurate with the security level if NIST's recommendation is followed to use a random seed of bit-length twice the intended security level. Clearly, this recommendation is not always followed.

The nut graf of the paper, right?


Much larger datasets can be handled without trouble. We chose not to describe the details of our calculation or how well it scales (sapienti sat). On a 1.8GHz i7 processor the straightforward approach would require about ten core-years and would not scale well. Inclusion of the p and q-values from Section 4 and the primes from related to the Debian OpenSSL vulnerability did not produce additional results.

Sapienti sat: "a word to the wise", which I take to mean, if you understand basic RSA attacks, this stuff is pretty straightforward. Notable: the Debian OpenSSL CSPRNG bug didn't seem to contribute to this finding.

pg 9

Irrespective of the way primes are selected (additive/sieving methods or methods using fresh random bits for each attempted prime selection), a variety of obvious scenarios is conceivable where poor initial seeding may lead to mishaps, with duplicate keys a consequence if no "fresh" local entropy is used at all. If the latter is used, the outcome may be worse: for instance, a not-properly-random fi rst choice may be prime right away (the probability that this happens is inversely proportional to the length of the prime, and thus non-negligible) and miss its chance to profi t from local entropy to become unique. But local entropy may lead to a second prime that is unique, and thus a vulnerable modulus.

To me the most interesting graf: here's a story about how generating a random prime is tricky even when you have a reliable source of random bits. You write a prime generator that taps entropy only after each guess. If your first guess is prime, you never tap entropy at all; you generate "correct" but insecure results.

Then, yadda dadda I skipped all the stuff about DSA and ElGamal except:

The only interesting fact we can report about ECDSA is the surprisingly small number of certi cates encountered that contain an ECDSA key (namely, just one), and the small number of certi cates signed by ECDSA (one self-signed and a handful of RSA keys).

Yay for progress!

pg 13

The lack of sophistication of our methods and fi ndings make it hard for us to believe that what we have presented is new, in particular to agencies and parties that are known for their curiosity in such matters. It may shed new light on NIST's 1991 decision to adopt DSA as digital signature standard as opposed to RSA, back then a "public controversy"; but note the well-known nonce-randomness concerns for ElGamal and (EC)DSA and what happens if the nonce is not properly used.

OH SNAP, NSA. This is so easy even YOU could figure it out.

Regarding nonce-randomness: this is what happened to Sony; if nonces to DSA are predictable, the whole algorithm falls.

I love this one:

The remaining connected component is the most intriguing - or suspicious - as it is a complete graph on nine vertices (K9): nine primes, each of whose 9C2 = 36 pairwise products occurs as n-value.

There's an RSA key generator out there which randomly picks two primes from a set of 9.

I'd love to know if the author of that generator knew that that's what (s)he was doing.

That is, I like the idea of someone trying to be a bit too clever and accidentally borking their code due to their unfamiliarity with math rather more than I like the idea of that kind of negligence.

Of course, who's to say that a single generator isn't responsible for more than one connected component of the graph...?

> OH SNAP, NSA. This is so easy even YOU could figure it out.

For what it's worth, my impression of that text was not as much about whether they could figure it out, but that they pretty much must have at this point given how simple it was. I don't think they were dissing the capabilities of the NSA, but simply trying to provide insight on the likelihood that this was included in them, and they concluded it almost certainly was.

I'm sure you're right, but the premise I was (facetiously) working from is that the NSA would ordinarily be presumed to have figured something out regardless of the level of sophistication involved in the attack.

Ah. We academics are sticklers about novelty and we try and pin down what is and isn't. We also maintain the conceit that most of our published ideas are. As far as most of us are concerned, our ideas are probably novel unless published elsewhere. These types of things are notable exceptions.

Which reminds me of a conversation I was having the other day which noted that there's not much fear of the NSA scooping you as a researcher because they don't really like publishing things they find useful. :)

It's just our weird and somewhat arrogant perspective, I think. :)

Another (related) winning quote (this paper is full of them):

Factoring one 1024-bit RSA modulus would be historic. Factoring 12720 such moduli is a statistic

I actually thought that paraphrasing Stalin jarred a bit.

Thanks for the excellent breakdown/notes.

Somewhat disappointing, but not surprising that they did not describe their actual calculation technique for finding the connected primes. It strikes me that there isn't a good excuse for having bad random numbers with the computation power we now have. A simple bitwise addition of various sources of supposedly random numbers is strictly stronger than any subset of its components. So we can combine lots of questionable strong sources and get a very strong source.

Generating the primes locally with insufficient independence seems more understandable than that the same primes are used by independent parties. That hints at a more severe problem with random number generation. Regardless, assuming this is confirmed, it is a significant problem that needs to be hunted down.

It was mentioned in the nytimes article, and in retrospect it's fairly obvious: you use the Euclidean algorithm.

If you have n1=p q1 and n2=p q2, so that the prime p is shared between them, then gcd(n1,n2) = p. That only takes a few iterations of bignum modulos to solve.

Ah, I took the 'the straightforward approach would require about ten core-years and would not scale well' to mean they had a more efficient method.

Hm, that's possible. A comprehensive search over 4 million pairs would be O(16 trillion) operations, so perhaps they gain something by reusing intermediate steps of the gcd, or something? Good point, I'm not sure. The brute-force solution is still not out of the question for a modestly large map-reduce cluster.

Yeah, regarding going ahead with the "naive" search, I'm coming up with a price of 5k to 15k on Amazon EC2 using the cluster compute 8XL instances, and under a month using 10 machines.

Considering the potential value of some of those rsa keys, that is a quite minimal investment. This might be an in the wild danger pretty quickly.

The chances of any one key being compromised are very, very low.

If this is a pattern of simple software flaws, or something environmental like keys generating during cold start entropy starvation, the likelihood of pulling a truly valuable key out of that soup are even lower: generally (not always, but generally) the more valuable keys are generated using very well known software, and under tight process constraints.

So basically you get ~10k random certs that allow you to MITM ~10k random sites. It's not nothing, but criminals have better ways of monetizing attacker effort.

A lot does depend on the actual circumstances for the bad RNG. If you had a good method of recognizing the valuable sites (anything involving money or maybe email?) it might be more practical considering a botnet would be just fine for this sort of computation.

As if it's somehow Ron Rivest's fault that people can't generate random numbers?

"Yes, because his algorithm requires you to get two secure ones, and other public key algorithms don't."

Would be the authors' response to that.

One could write an equivalent article suggesting the need to avoid weak groups for Diffie-Hellman means that Rivest is right.

The weak group problem is addressable by a NIST standard in a way that the two-relatively-secure-primes problem isn't. "Here, use exactly these values", instead of "use code that does X, not Y".

And the retort to the author would be the case of the PS3 and ECDSA. Using low entropy PRNGs at all is the _REAL_ problem.

Are we sure the problem is a low entropy CSPRNG? The problem could be "when bad primes happen to good CSPRNGs".

(The paper itself makes the DSA point.)

They do mention that the lower depth results of the tree could be from bad entropy on a CSPRNG. Which is what I'm making my comment from. There may be a potential flaw in an implementation e.g CRT.

My comment was in regards to crypto usage in general where the most problematic factor is the misuse of PRNG and seeds. (ECDSA PS3 for example).

edit: Article makes points towards a flaw in the CSPRNG and Prime generation.

I don't think I'm being clear. Sorry.

Obviously, if unrelated people around the world are at random times coming up with the same (ostensibly) random gigantic prime factors, there is a randomness issue.

The point the paper makes is that this isn't necessarily a CSPRNG issue. The issue is not necessarily that people are using rand() to generate primes. It's that to generate factors for RSA, you need more than a CSPRNG: you need a prime number generator that correctly uses the CSPRNG. The paper hypothesizes one very plausible prime number generator that would be insecure even if driven off hardware entropy.

I understand your point now and reread that end section 3 of the article. The authors seem to imply that it is possibly resultant from a toolkit which a) doesn't implement the generation of provable primes correctly and therefor b) doesn't comply with NIST standards and will be not be considered for FIPS 186-3 approval.

I've tried to do quick research and the method of generating random primes seems to have been added between FIPS 186-2 and FIPS 186-3. Since FIPS 186-3 only came out (approved not draft) in 2009 its probably, feel free to prove this extrapolation wrong, that these older key pairs could've been generated by an older toolkit, or in a time where this random prime generation recommendation was unknown. So once again we're bitten by these multi-year certificates.

I'm guilty of not having read every word of the paper. But for those who have, perhaps you could indulge my laziness:

Have they ruled out the cause as a buggy or backdoored key generator? Or running down the cause left for some future research?

They did not rule that out, but when they write things like "the quagmire of vulnerabilities that we waded into, makes it infeasible to properly inform everyone involved", I don't get the sense that "backdoor" is top of mind.

Utterly simple and completely obvious in retrospect -> game changing.

Maybe it's a good idea to tag a public key with the software/version used to create it. Debian fiasco keys are easy to pick out - the entropy was so low that the bad keys can be enumerated. This new attack relies on any small weakness in the distribution of primes, and it seems quite hard to track down responsible versions of software.

I wonder how many of these shared factors aren't the actual value of p or q chosen at generation time (primality tests are probabilistic).

> Maybe it's a good idea to tag a public key with the software/version used to create it.

I'm a full-disclosure guy at heart, but why would I want to advertise to the attacker my potential weaknesses?

Wouldn't I find my own personal selfish security is maximized when I falsify that field when generating my keys?

> Wouldn't I find my own personal selfish security is maximized when I falsify that field when generating my keys?

It depends if you're the type of user who would like to rely on hearing themselves that their keys are vulnerable because of the software used or if you're the type of user who would like to be informed by an automated script that their keys are vulnerable.

I am guessing there are many in the latter category and a few in the former category but at the same time most of the people who decide these things will live in the former.

As for you, marshray, personally, I'm going to go with yes. But does that really make things worse for those who opt in to the potential for notifications? I think you can make your security tradeoff and others can make theirs and both categories of users will end up happy.

Hence the 'maybe'. But what information would actually be leaked by knowing the generation software (assuming it's decoupled from the server software)? It seems if a certain implementation's keys are weak, one could simply try to exploit that over all keys. I don't need to know you're running Debian before I go scanning for those particular weak keys.

The responsible version of software are armchair cryptographers or average programmers trying to get things to 'work'. I bet this all results simply using OpenSSL with a SAMPLE to seed a PRNG or a bad seed of 0's.

To see if it could potentially be a afflicted version of a toolkit, a scatter plot of dates and frequencies of these shared primes would provide a little more information.

Bad system entropy is probably causing a lot of the duplicate moduli (complete keys), but not the cases where only one factor is shared. To me, that seems like an implementation weakness skewing the distribution of (probable) primes.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact