Posting what I said in the other thread: As far as I can tell, this is what they're doing: if two different keys have a factor in common (i.e. A=PQ, B=PR), then you can use Euclid's algorithm (which just requires repeated subtraction, and is thus really easy) to find P (=gcd(A,B)), and then just use division to find Q (=A/P) and R (=B/P) easily.
So what the researchers did, apparently, was to gather all the RSA public keys they could find (6 million or so) and then calculate the gcds of all pairs of keys. (Presumably they were slightly more clever than this, but it doesn't really matter--doing this is still vastly easier than factoring even a single public key). Whenever they found a gcd that wasn't equal to 1, they'd cracked (at least) 2 keys. And it turns out that roughly 0.2% of the keys could be cracked in this way (for a total of 12720 1024-bit moduli).
As far as I can tell, the problem isn't with RSA itself, but with whatever algorithms are used to pick two random primes to multiply and produce the key (or moduli, whatever)--they don't have high enough entropy and output duplicates ~0.2% of the time.
As far as I can tell, the problem isn't with RSA itself, but with whatever algorithms are used to pick two random primes to multiply and produce the key (or moduli, whatever)...
they don't have high enough entropy and output duplicates ~0.2% of the time.
Almost correct: They don't have enough entropy at least 0.2% of the time. These results are consistent with all primes being generated with ~40 bits of entropy -- which would imply that all the keys are broken. Obviously we hope that this is not the case and instead it's just a small subset of primes which were generated with low entropy, but the data do not provide any evidence in either direction.
the graph of the moduli contained in c distinct certificates consists, in an ideal world, of c disjoint connected components each consisting of two vertices joined by a single edge, for a total of 2c vertices and c edges ... tens of millions of [lab-]generated RSA moduli turned out to behave as expected based on the above assumption.
That implies that there is at least one (presumably common) method for generating RSA keys which does not frequently output duplicate primes.
Not so. Given a set of N elements from which you choose k (with order not mattering, since you only care whether a given two are the same) the number of pairs is N!/(k!(N-k)!). So it's (6000000!)/(2!(6000000-2)!).
We can take advantage of the cancellation properties of the factorial to write this as N\(N-1)\...\(N-k+1)/k! . If k=2 this resolves to N(N-1)/2.
You could also derive it like this: for ordered* pairs, you have N choices for the first and N-1 for the second (since you don't pair anything with itself but all other choices are permitted). This counts each pair exactly twice: each pair and its reverse are the same if you don't care about order. So N(N-1)/2.
Still trillions of combinations. Not 5999999! though.
At the risk of redundancy (I wrote my post while you posted): you're double counting (since gcd(a,b) = gcd(b,a), you needn't compute both; either tells you whether a,b have a common factor. (a,b) and (b,a) having a common factor both account for the same problem) and you're including pairs of the same element (gcd(a,a)=a always, but this is meaningless for the data set we're considering--we don't care that a number has common factors with itself, only with other numbers)
The title of a paper doesn't need to tell you exactly what the paper has in it. (It can't, except in those cases where the content of the paper is perfectly expressible in a single snappy sentence.)
It does need to (1) give a clue whether you want to read it and (2) let you recall what the paper is about when you see the title on a later occasion.
In this instance, #1 goes like this: "Oh, it's by Lenstra. And it's got a cutesy title. It's got to be worth a look, if only to see what on earth the title means." And #2 ... well, do you think any interested party who read this paper is going to have any difficulty remembering what it's about, say, 10 years from now? I don't.
You can't get away with this sort of thing if you aren't as big a name as Lenstra -- or if your paper isn't about something vitally important that (in hindsight) is easily associated with the paper title. But that's OK. Most of Lenstra's papers have titles like "Fast Irreducibility and Subgroup Membership Testing in XTR" and "Factoring Multivariate Polynomials over Algebraic Number Fields".
Or are you saying, "Ron Rivest was wrong, Whitfield Diffie was right" is still a horrible title?
Yes. That title made me think "ok, what famous argument dud they have which this is referring to?" (the way "Torvalds was wrong, Tanenbaum was right" would) and I can't think of any famous arguments about whether people have enough entropy in their prime generation.
To me, the biggest difference between DH and RSA is that DH is stateful and RSA is stateless. Next up is that with DH you're generating random exponents each time instead of doing it once during key generation.
The number of primes involved is way down on my list of important differences between DH and RSA.
The whole title is horrible because of its not even related to the outcome of the experiment/paper. Ron and Whit are neither both right nor both wrong, nothing in the article even came close to proving such a title. Using bad entropy is wrong.
What would be nice to see is the age of the affected certificates graphed in a scatter. It would be interesting to note if there are patterns of grouping. We all read the articles from Peter Goodman et al where he demonstrates horrible entropy [0,0,0,0,0,0,0,0].
Of course, but that doesn't make it beyond criticism!
I personally liked it and am somewhat surprised that cperciva didn't. That alone made reading this conversation interesting as far as I'm concerned. And is going to make me think harder about opting for titles on my papers that opt for Whit over content.
This paper has as much social science commentary as math. Things I found interesting or amusing:
pg2 (last graf into pg3)
Publication of results undermining the security of live keys is uncommon and inappropriate, unless all affected parties have been notified. In the present case, observing the above phenomena on lab-generated test data would not be convincing and would not work either: tens of millions of thus generated RSA moduli turned out to behave as expected based on the above assumption. Therefore limited to live data, our intention was to confirm the assumption, expecting at worst a very small number of counterexamples and affected owners to be notified. The quagmire of vulnerabilities that we waded into, makes it infeasible to properly inform everyone involved...
"Of course when we generate keys ourselves they work fine. We have no idea why there are so many collisions in the real world, and there are so many plausible suspects we don't even know where to start notifying people.
pg3 (last graf)
As was the case with the Debian OpenSSL vulnerability, we believe that publication is preferable to attempts to cover up weaknesses. There is a risk, but people desiring to exploit the current situation will have to redo our work, both the data collection and the calculation, as we have put our data under custody. We hope that people who know how to do the computation quickly will show good judgment.
Translated: "On the scale of crypto attacks this is so easy to do that just talking about it discloses the attack; please don't be dicks about that."
... Except for eight times e = 1 and two even e-values among the PGP RSA keys ...
... Two e-values were found that, due to their size and random appearance, may correspond to a short secret exponent (we have not investigated this).
Looking at the owners with shared n-values among the relevant set of 266 729 X.509 certicates, many of the duplications are re-certications or other types of unsuspicious recycling of the same key material by its supposedly legal owner. It also becomes clear that any single owner may come in many different guises. On the other hand, there are also many instances where an n-value is shared among seemingly unrelated owners. Distinguishing intentionally shared keys from other duplications (which are prone to fraud) is not straightforward, and is not facilitated by the volume of data we are dealing with (as 266 729 cases have to be considered). We leave it as a subject for further investigation into this "fuzzy" recognition problem to come up with good insights, useful information, and reliable counts.
Translation: "This is not the only WTF we found in the survey."
Although 512-bit and 768-bit RSA moduli were factored in 1999 and 2009, respectively, 1.6% of the n-values have 512 bits (with 0.01% of size 384 and smallest size 374 occurring once) and 0.8% of size 768. Those moduli are weak, but still offer marginal security. A large number of the 512-bit ones were certified after the year 2000 and even until a few years ago. With 73.9% the most common size is 1024 bits, followed by 2048 bits with 21.7%. Sizes 3072, 4096, and 8192 contribute 0.04%, 1.5%, and 0.01%, respectively. The largest size is 16384 bits, of which there are 181 (0.003%).
Most certs have 1024 bit keys, but there are still semirecent 512 bit certs out there.
Generation of a regular RSA modulus consists of finding two random prime numbers. This must be done in such a way that these primes were not selected by anyone else before. The probability not to regenerate a prime is commensurate with the security level if NIST's recommendation is followed to use a random seed of bit-length twice the intended security level. Clearly, this recommendation is not always followed.
The nut graf of the paper, right?
Much larger datasets can be handled without trouble. We chose not to describe the details of our calculation or how well it scales (sapienti sat). On a 1.8GHz i7 processor the straightforward approach would require about ten core-years and would not scale well. Inclusion of the p and q-values from Section 4 and the primes from related to the Debian OpenSSL vulnerability did not produce additional results.
Sapienti sat: "a word to the wise", which I take to mean, if you understand basic RSA attacks, this stuff is pretty straightforward. Notable: the Debian OpenSSL CSPRNG bug didn't seem to contribute to this finding.
Irrespective of the way primes are selected (additive/sieving methods or methods using fresh random bits for each attempted prime selection), a variety of obvious scenarios is conceivable where poor initial seeding may lead to mishaps, with duplicate keys a consequence if no "fresh" local entropy is used at all. If the latter is used, the outcome may be worse: for instance, a not-properly-random first choice may be prime right away (the probability that this happens is inversely proportional to the length of the prime, and thus non-negligible) and miss its chance to profit from local entropy to become unique. But local entropy may lead to a second prime that is unique, and thus a vulnerable modulus.
To me the most interesting graf: here's a story about how generating a random prime is tricky even when you have a reliable source of random bits. You write a prime generator that taps entropy only after each guess. If your first guess is prime, you never tap entropy at all; you generate "correct" but insecure results.
Then, yadda dadda I skipped all the stuff about DSA and ElGamal except:
The only interesting fact we can report about ECDSA is the surprisingly small number of certicates encountered that contain an ECDSA key (namely, just one), and the small number of certicates signed by ECDSA (one self-signed and a handful of RSA keys).
Yay for progress!
The lack of sophistication of our methods and findings make it hard for us to believe that what we have presented is new, in particular to agencies and parties that are known for their curiosity in such matters. It may shed new light on NIST's 1991 decision to adopt DSA as digital signature standard as opposed to RSA, back then a "public controversy"; but note the well-known nonce-randomness concerns for ElGamal and (EC)DSA and what happens if the nonce is not properly used.
OH SNAP, NSA. This is so easy even YOU could figure it out.
Regarding nonce-randomness: this is what happened to Sony; if nonces to DSA are predictable, the whole algorithm falls.
> OH SNAP, NSA. This is so easy even YOU could figure it out.
For what it's worth, my impression of that text was not as much about whether they could figure it out, but that they pretty much must have at this point given how simple it was. I don't think they were dissing the capabilities of the NSA, but simply trying to provide insight on the likelihood that this was included in them, and they concluded it almost certainly was.
I'm sure you're right, but the premise I was (facetiously) working from is that the NSA would ordinarily be presumed to have figured something out regardless of the level of sophistication involved in the attack.
Ah. We academics are sticklers about novelty and we try and pin down what is and isn't. We also maintain the conceit that most of our published ideas are. As far as most of us are concerned, our ideas are probably novel unless published elsewhere. These types of things are notable exceptions.
Which reminds me of a conversation I was having the other day which noted that there's not much fear of the NSA scooping you as a researcher because they don't really like publishing things they find useful. :)
It's just our weird and somewhat arrogant perspective, I think. :)
Somewhat disappointing, but not surprising that they did not describe their actual calculation technique for finding the connected primes. It strikes me that there isn't a good excuse for having bad random numbers with the computation power we now have. A simple bitwise addition of various sources of supposedly random numbers is strictly stronger than any subset of its components. So we can combine lots of questionable strong sources and get a very strong source.
Generating the primes locally with insufficient independence seems more understandable than that the same primes are used by independent parties. That hints at a more severe problem with random number generation. Regardless, assuming this is confirmed, it is a significant problem that needs to be hunted down.
Hm, that's possible. A comprehensive search over 4 million pairs would be O(16 trillion) operations, so perhaps they gain something by reusing intermediate steps of the gcd, or something? Good point, I'm not sure. The brute-force solution is still not out of the question for a modestly large map-reduce cluster.
The chances of any one key being compromised are very, very low.
If this is a pattern of simple software flaws, or something environmental like keys generating during cold start entropy starvation, the likelihood of pulling a truly valuable key out of that soup are even lower: generally (not always, but generally) the more valuable keys are generated using very well known software, and under tight process constraints.
So basically you get ~10k random certs that allow you to MITM ~10k random sites. It's not nothing, but criminals have better ways of monetizing attacker effort.
A lot does depend on the actual circumstances for the bad RNG. If you had a good method of recognizing the valuable sites (anything involving money or maybe email?) it might be more practical considering a botnet would be just fine for this sort of computation.
Obviously, if unrelated people around the world are at random times coming up with the same (ostensibly) random gigantic prime factors, there is a randomness issue.
The point the paper makes is that this isn't necessarily a CSPRNG issue. The issue is not necessarily that people are using rand() to generate primes. It's that to generate factors for RSA, you need more than a CSPRNG: you need a prime number generator that correctly uses the CSPRNG. The paper hypothesizes one very plausible prime number generator that would be insecure even if driven off hardware entropy.
I understand your point now and reread that end section 3 of the article. The authors seem to imply that it is possibly resultant from a toolkit which a) doesn't implement the generation of provable primes correctly and therefor b) doesn't comply with NIST standards and will be not be considered for FIPS 186-3 approval.
I've tried to do quick research and the method of generating random primes seems to have been added between FIPS 186-2 and FIPS 186-3. Since FIPS 186-3 only came out (approved not draft) in 2009 its probably, feel free to prove this extrapolation wrong, that these older key pairs could've been generated by an older toolkit, or in a time where this random prime generation recommendation was unknown. So once again we're bitten by these multi-year certificates.
They did not rule that out, but when they write things like "the quagmire of vulnerabilities that we waded into, makes it infeasible to properly inform everyone involved", I don't get the sense that "backdoor" is top of mind.
Utterly simple and completely obvious in retrospect -> game changing.
Maybe it's a good idea to tag a public key with the software/version used to create it. Debian fiasco keys are easy to pick out - the entropy was so low that the bad keys can be enumerated. This new attack relies on any small weakness in the distribution of primes, and it seems quite hard to track down responsible versions of software.
I wonder how many of these shared factors aren't the actual value of p or q chosen at generation time (primality tests are probabilistic).
> Wouldn't I find my own personal selfish security is maximized when I falsify that field when generating my keys?
It depends if you're the type of user who would like to rely on hearing themselves that their keys are vulnerable because of the software used or if you're the type of user who would like to be informed by an automated script that their keys are vulnerable.
I am guessing there are many in the latter category and a few in the former category but at the same time most of the people who decide these things will live in the former.
As for you, marshray, personally, I'm going to go with yes. But does that really make things worse for those who opt in to the potential for notifications? I think you can make your security tradeoff and others can make theirs and both categories of users will end up happy.
Hence the 'maybe'. But what information would actually be leaked by knowing the generation software (assuming it's decoupled from the server software)? It seems if a certain implementation's keys are weak, one could simply try to exploit that over all keys. I don't need to know you're running Debian before I go scanning for those particular weak keys.
The responsible version of software are armchair cryptographers or average programmers trying to get things to 'work'. I bet this all results simply using OpenSSL with a SAMPLE to seed a PRNG or a bad seed of 0's.
To see if it could potentially be a afflicted version of a toolkit, a scatter plot of dates and frequencies of these shared primes would provide a little more information.
Bad system entropy is probably causing a lot of the duplicate moduli (complete keys), but not the cases where only one factor is shared. To me, that seems like an implementation weakness skewing the distribution of (probable) primes.