Ron was Wrong, Whit is Right (iacr.org) 122 points by sew on Feb 14, 2012 | hide | past | web | favorite | 59 comments

 Posting what I said in the other thread: As far as I can tell, this is what they're doing: if two different keys have a factor in common (i.e. A=PQ, B=PR), then you can use Euclid's algorithm (which just requires repeated subtraction, and is thus really easy) to find P (=gcd(A,B)), and then just use division to find Q (=A/P) and R (=B/P) easily. So what the researchers did, apparently, was to gather all the RSA public keys they could find (6 million or so) and then calculate the gcds of all pairs of keys. (Presumably they were slightly more clever than this, but it doesn't really matter--doing this is still vastly easier than factoring even a single public key). Whenever they found a gcd that wasn't equal to 1, they'd cracked (at least) 2 keys. And it turns out that roughly 0.2% of the keys could be cracked in this way (for a total of 12720 1024-bit moduli). As far as I can tell, the problem isn't with RSA itself, but with whatever algorithms are used to pick two random primes to multiply and produce the key (or moduli, whatever)--they don't have high enough entropy and output duplicates ~0.2% of the time.
 As far as I can tell, the problem isn't with RSA itself, but with whatever algorithms are used to pick two random primes to multiply and produce the key (or moduli, whatever)...Correct.they don't have high enough entropy and output duplicates ~0.2% of the time.Almost correct: They don't have enough entropy at least 0.2% of the time. These results are consistent with all primes being generated with ~40 bits of entropy -- which would imply that all the keys are broken. Obviously we hope that this is not the case and instead it's just a small subset of primes which were generated with low entropy, but the data do not provide any evidence in either direction.
 They do write:the graph of the moduli contained in c distinct certificates consists, in an ideal world, of c disjoint connected components each consisting of two vertices joined by a single edge, for a total of 2c vertices and c edges ... tens of millions of [lab-]generated RSA moduli turned out to behave as expected based on the above assumption.That implies that there is at least one (presumably common) method for generating RSA keys which does not frequently output duplicate primes.
 Security requires more than "does not frequently output duplicate primes". It requires "primes have a large amount of entropy".
 So what the researchers did, apparently, was to gather all the RSA public keys they could find (6 million or so) and then calculate the gcds of all pairs of keys.Apparently not. The number of unique pairings among 6 million is 5,999,999 factorial. Which is a _really_ large number.
 Not so. Given a set of N elements from which you choose k (with order not mattering, since you only care whether a given two are the same) the number of pairs is N!/(k!(N-k)!). So it's (6000000!)/(2!(6000000-2)!).We can take advantage of the cancellation properties of the factorial to write this as N\(N-1)\...\(N-k+1)/k! . If k=2 this resolves to N(N-1)/2.You could also derive it like this: for ordered* pairs, you have N choices for the first and N-1 for the second (since you don't pair anything with itself but all other choices are permitted). This counts each pair exactly twice: each pair and its reverse are the same if you don't care about order. So N(N-1)/2.Still trillions of combinations. Not 5999999! though.
 Isn't it 6 million squared? 36 x 10^12. Factorial would be the number of ordered sets that include all the elements.
 At the risk of redundancy (I wrote my post while you posted): you're double counting (since gcd(a,b) = gcd(b,a), you needn't compute both; either tells you whether a,b have a common factor. (a,b) and (b,a) having a common factor both account for the same problem) and you're including pairs of the same element (gcd(a,a)=a always, but this is meaningless for the data set we're considering--we don't care that a number has common factors with itself, only with other numbers)
 In case it's not obvious to everyone: Ron = Ron Rivest (the R in RSA); Whit = Whitfield Diffie (of Diffie-Hellman).
 What a bad choice for a paper title.
 I don't agree. It's short and gets across the main point to the right people. If you don't know the names being referred to, you're probably not the audience for the paper.
 I am the audience for this paper, and I recognized the names.I still had absolutely no clue what the title meant until grandpa pointed it out. I agree with locacorten: Horrible title.
 The title of a paper doesn't need to tell you exactly what the paper has in it. (It can't, except in those cases where the content of the paper is perfectly expressible in a single snappy sentence.)It does need to (1) give a clue whether you want to read it and (2) let you recall what the paper is about when you see the title on a later occasion.In this instance, #1 goes like this: "Oh, it's by Lenstra. And it's got a cutesy title. It's got to be worth a look, if only to see what on earth the title means." And #2 ... well, do you think any interested party who read this paper is going to have any difficulty remembering what it's about, say, 10 years from now? I don't.You can't get away with this sort of thing if you aren't as big a name as Lenstra -- or if your paper isn't about something vitally important that (in hindsight) is easily associated with the paper title. But that's OK. Most of Lenstra's papers have titles like "Fast Irreducibility and Subgroup Membership Testing in XTR" and "Factoring Multivariate Polynomials over Algebraic Number Fields".
 You can't see "Whit" and immediately know "it's Whit Diffie", or given Whit Diffie expect that "Ron" means "Ron Rivest"? I got it instantly and have way less crypto than you.Or are you saying, "Ron Rivest was wrong, Whitfield Diffie was right" is still a horrible title?The whole paper had the feel of a rump session pub, didn't it?
 Or are you saying, "Ron Rivest was wrong, Whitfield Diffie was right" is still a horrible title?Yes. That title made me think "ok, what famous argument dud they have which this is referring to?" (the way "Torvalds was wrong, Tanenbaum was right" would) and I can't think of any famous arguments about whether people have enough entropy in their prime generation.
 To an implementor, isn't one of the biggest differences between DH (super easy to implement) and RSA (considerably trickier) the difference between needing one prime number instead of two?Anyways I take your point. The paper really doesn't have too much to do with the title.
 To me, the biggest difference between DH and RSA is that DH is stateful and RSA is stateless. Next up is that with DH you're generating random exponents each time instead of doing it once during key generation.The number of primes involved is way down on my list of important differences between DH and RSA.
 Stephen was wrong, Kip was right! :)
 The whole title is horrible because of its not even related to the outcome of the experiment/paper. Ron and Whit are neither both right nor both wrong, nothing in the article even came close to proving such a title. Using bad entropy is wrong.What would be nice to see is the age of the affected certificates graphed in a scatter. It would be interesting to note if there are patterns of grouping. We all read the articles from Peter Goodman et al where he demonstrates horrible entropy [0,0,0,0,0,0,0,0].
 Arjen Lenstra gets to pick whatever titles he likes.
 Of course, but that doesn't make it beyond criticism!I personally liked it and am somewhat surprised that cperciva didn't. That alone made reading this conversation interesting as far as I'm concerned. And is going to make me think harder about opting for titles on my papers that opt for Whit over content.
 I agree. The math and theory behind RSA is correct, it's only certain implementations on the web that are lacking.
 Should certificate authorities be checking that any new key they sign doesn't have a common factor in common with any other key they've signed?It would have discovered a few problems much earlier, but it conceivably leaks information when your CSR gets rejected.
 It would only "leak" information which is already public.
 I love this one:The remaining connected component is the most intriguing - or suspicious - as it is a complete graph on nine vertices (K9): nine primes, each of whose 9C2 = 36 pairwise products occurs as n-value.There's an RSA key generator out there which randomly picks two primes from a set of 9.
 I'd love to know if the author of that generator knew that that's what (s)he was doing.That is, I like the idea of someone trying to be a bit too clever and accidentally borking their code due to their unfamiliarity with math rather more than I like the idea of that kind of negligence.Of course, who's to say that a single generator isn't responsible for more than one connected component of the graph...?
 > OH SNAP, NSA. This is so easy even YOU could figure it out.For what it's worth, my impression of that text was not as much about whether they could figure it out, but that they pretty much must have at this point given how simple it was. I don't think they were dissing the capabilities of the NSA, but simply trying to provide insight on the likelihood that this was included in them, and they concluded it almost certainly was.
 I'm sure you're right, but the premise I was (facetiously) working from is that the NSA would ordinarily be presumed to have figured something out regardless of the level of sophistication involved in the attack.
 Ah. We academics are sticklers about novelty and we try and pin down what is and isn't. We also maintain the conceit that most of our published ideas are. As far as most of us are concerned, our ideas are probably novel unless published elsewhere. These types of things are notable exceptions.Which reminds me of a conversation I was having the other day which noted that there's not much fear of the NSA scooping you as a researcher because they don't really like publishing things they find useful. :)It's just our weird and somewhat arrogant perspective, I think. :)
 Another (related) winning quote (this paper is full of them):Factoring one 1024-bit RSA modulus would be historic. Factoring 12720 such moduli is a statistic
 I actually thought that paraphrasing Stalin jarred a bit.
 Thanks for the excellent breakdown/notes.
 Somewhat disappointing, but not surprising that they did not describe their actual calculation technique for finding the connected primes. It strikes me that there isn't a good excuse for having bad random numbers with the computation power we now have. A simple bitwise addition of various sources of supposedly random numbers is strictly stronger than any subset of its components. So we can combine lots of questionable strong sources and get a very strong source.Generating the primes locally with insufficient independence seems more understandable than that the same primes are used by independent parties. That hints at a more severe problem with random number generation. Regardless, assuming this is confirmed, it is a significant problem that needs to be hunted down.
 It was mentioned in the nytimes article, and in retrospect it's fairly obvious: you use the Euclidean algorithm.If you have n1=p q1 and n2=p q2, so that the prime p is shared between them, then gcd(n1,n2) = p. That only takes a few iterations of bignum modulos to solve.
 Ah, I took the 'the straightforward approach would require about ten core-years and would not scale well' to mean they had a more efficient method.
 Hm, that's possible. A comprehensive search over 4 million pairs would be O(16 trillion) operations, so perhaps they gain something by reusing intermediate steps of the gcd, or something? Good point, I'm not sure. The brute-force solution is still not out of the question for a modestly large map-reduce cluster.
 Yeah, regarding going ahead with the "naive" search, I'm coming up with a price of 5k to 15k on Amazon EC2 using the cluster compute 8XL instances, and under a month using 10 machines.
 Considering the potential value of some of those rsa keys, that is a quite minimal investment. This might be an in the wild danger pretty quickly.
 The chances of any one key being compromised are very, very low.If this is a pattern of simple software flaws, or something environmental like keys generating during cold start entropy starvation, the likelihood of pulling a truly valuable key out of that soup are even lower: generally (not always, but generally) the more valuable keys are generated using very well known software, and under tight process constraints.So basically you get ~10k random certs that allow you to MITM ~10k random sites. It's not nothing, but criminals have better ways of monetizing attacker effort.
 A lot does depend on the actual circumstances for the bad RNG. If you had a good method of recognizing the valuable sites (anything involving money or maybe email?) it might be more practical considering a botnet would be just fine for this sort of computation.
 As if it's somehow Ron Rivest's fault that people can't generate random numbers?
 "Yes, because his algorithm requires you to get two secure ones, and other public key algorithms don't."Would be the authors' response to that.
 One could write an equivalent article suggesting the need to avoid weak groups for Diffie-Hellman means that Rivest is right.
 The weak group problem is addressable by a NIST standard in a way that the two-relatively-secure-primes problem isn't. "Here, use exactly these values", instead of "use code that does X, not Y".
 And the retort to the author would be the case of the PS3 and ECDSA. Using low entropy PRNGs at all is the _REAL_ problem.
 Are we sure the problem is a low entropy CSPRNG? The problem could be "when bad primes happen to good CSPRNGs".(The paper itself makes the DSA point.)
 They do mention that the lower depth results of the tree could be from bad entropy on a CSPRNG. Which is what I'm making my comment from. There may be a potential flaw in an implementation e.g CRT.My comment was in regards to crypto usage in general where the most problematic factor is the misuse of PRNG and seeds. (ECDSA PS3 for example).edit: Article makes points towards a flaw in the CSPRNG and Prime generation.
 I don't think I'm being clear. Sorry.Obviously, if unrelated people around the world are at random times coming up with the same (ostensibly) random gigantic prime factors, there is a randomness issue.The point the paper makes is that this isn't necessarily a CSPRNG issue. The issue is not necessarily that people are using rand() to generate primes. It's that to generate factors for RSA, you need more than a CSPRNG: you need a prime number generator that correctly uses the CSPRNG. The paper hypothesizes one very plausible prime number generator that would be insecure even if driven off hardware entropy.
 I understand your point now and reread that end section 3 of the article. The authors seem to imply that it is possibly resultant from a toolkit which a) doesn't implement the generation of provable primes correctly and therefor b) doesn't comply with NIST standards and will be not be considered for FIPS 186-3 approval.I've tried to do quick research and the method of generating random primes seems to have been added between FIPS 186-2 and FIPS 186-3. Since FIPS 186-3 only came out (approved not draft) in 2009 its probably, feel free to prove this extrapolation wrong, that these older key pairs could've been generated by an older toolkit, or in a time where this random prime generation recommendation was unknown. So once again we're bitten by these multi-year certificates.
 I'm guilty of not having read every word of the paper. But for those who have, perhaps you could indulge my laziness:Have they ruled out the cause as a buggy or backdoored key generator? Or running down the cause left for some future research?
 They did not rule that out, but when they write things like "the quagmire of vulnerabilities that we waded into, makes it infeasible to properly inform everyone involved", I don't get the sense that "backdoor" is top of mind.
 Utterly simple and completely obvious in retrospect -> game changing.Maybe it's a good idea to tag a public key with the software/version used to create it. Debian fiasco keys are easy to pick out - the entropy was so low that the bad keys can be enumerated. This new attack relies on any small weakness in the distribution of primes, and it seems quite hard to track down responsible versions of software.I wonder how many of these shared factors aren't the actual value of p or q chosen at generation time (primality tests are probabilistic).
 > Maybe it's a good idea to tag a public key with the software/version used to create it.I'm a full-disclosure guy at heart, but why would I want to advertise to the attacker my potential weaknesses?Wouldn't I find my own personal selfish security is maximized when I falsify that field when generating my keys?
 > Wouldn't I find my own personal selfish security is maximized when I falsify that field when generating my keys?It depends if you're the type of user who would like to rely on hearing themselves that their keys are vulnerable because of the software used or if you're the type of user who would like to be informed by an automated script that their keys are vulnerable.I am guessing there are many in the latter category and a few in the former category but at the same time most of the people who decide these things will live in the former.As for you, marshray, personally, I'm going to go with yes. But does that really make things worse for those who opt in to the potential for notifications? I think you can make your security tradeoff and others can make theirs and both categories of users will end up happy.
 Hence the 'maybe'. But what information would actually be leaked by knowing the generation software (assuming it's decoupled from the server software)? It seems if a certain implementation's keys are weak, one could simply try to exploit that over all keys. I don't need to know you're running Debian before I go scanning for those particular weak keys.
 The responsible version of software are armchair cryptographers or average programmers trying to get things to 'work'. I bet this all results simply using OpenSSL with a SAMPLE to seed a PRNG or a bad seed of 0's.To see if it could potentially be a afflicted version of a toolkit, a scatter plot of dates and frequencies of these shared primes would provide a little more information.
 Bad system entropy is probably causing a lot of the duplicate moduli (complete keys), but not the cases where only one factor is shared. To me, that seems like an implementation weakness skewing the distribution of (probable) primes.

Search: