they don't have high enough entropy and output duplicates ~0.2% of the time.
Almost correct: They don't have enough entropy at least 0.2% of the time. These results are consistent with all primes being generated with ~40 bits of entropy -- which would imply that all the keys are broken. Obviously we hope that this is not the case and instead it's just a small subset of primes which were generated with low entropy, but the data do not provide any evidence in either direction.
the graph of the moduli contained in c distinct certificates consists, in an ideal world, of c disjoint connected components each consisting of two vertices joined by a single edge, for a total of 2c vertices and c edges ... tens of millions of [lab-]generated RSA moduli turned out to behave as expected based on the above assumption.
That implies that there is at least one (presumably common) method for generating RSA keys which does not frequently output duplicate primes.
Apparently not. The number of unique pairings among 6 million is 5,999,999 factorial. Which is a _really_ large number.
We can take advantage of the cancellation properties of the factorial to write this as N\(N-1)\...\(N-k+1)/k! . If k=2 this resolves to N(N-1)/2.
You could also derive it like this: for ordered* pairs, you have N choices for the first and N-1 for the second (since you don't pair anything with itself but all other choices are permitted). This counts each pair exactly twice: each pair and its reverse are the same if you don't care about order. So N(N-1)/2.
Still trillions of combinations. Not 5999999! though.
I still had absolutely no clue what the title meant until grandpa pointed it out. I agree with locacorten: Horrible title.
It does need to (1) give a clue whether you want to read it and (2) let you recall what the paper is about when you see the title on a later occasion.
In this instance, #1 goes like this: "Oh, it's by Lenstra. And it's got a cutesy title. It's got to be worth a look, if only to see what on earth the title means." And #2 ... well, do you think any interested party who read this paper is going to have any difficulty remembering what it's about, say, 10 years from now? I don't.
You can't get away with this sort of thing if you aren't as big a name as Lenstra -- or if your paper isn't about something vitally important that (in hindsight) is easily associated with the paper title. But that's OK. Most of Lenstra's papers have titles like "Fast Irreducibility and Subgroup Membership Testing in XTR" and "Factoring Multivariate Polynomials over Algebraic Number Fields".
Or are you saying, "Ron Rivest was wrong, Whitfield Diffie was right" is still a horrible title?
The whole paper had the feel of a rump session pub, didn't it?
Yes. That title made me think "ok, what famous argument dud they have which this is referring to?" (the way "Torvalds was wrong, Tanenbaum was right" would) and I can't think of any famous arguments about whether people have enough entropy in their prime generation.
Anyways I take your point. The paper really doesn't have too much to do with the title.
The number of primes involved is way down on my list of important differences between DH and RSA.
What would be nice to see is the age of the affected certificates graphed in a scatter. It would be interesting to note if there are patterns of grouping. We all read the articles from Peter Goodman et al where he demonstrates horrible entropy [0,0,0,0,0,0,0,0].
I personally liked it and am somewhat surprised that cperciva didn't. That alone made reading this conversation interesting as far as I'm concerned. And is going to make me think harder about opting for titles on my papers that opt for Whit over content.
It would have discovered a few problems much earlier, but it conceivably leaks information when your CSR gets rejected.
pg2 (last graf into pg3)
Publication of results undermining the security of live keys is uncommon and inappropriate, unless all affected parties have been notified. In the present case, observing the above phenomena on lab-generated test data would not be convincing and would not work either: tens of millions of thus generated RSA moduli turned out to behave as expected based on the above assumption. Therefore limited to live data, our intention was to confirm the assumption, expecting at worst a very small number of counterexamples and affected owners to be notified. The quagmire of vulnerabilities that we waded into, makes it infeasible to properly inform everyone involved...
"Of course when we generate keys ourselves they work fine. We have no idea why there are so many collisions in the real world, and there are so many plausible suspects we don't even know where to start notifying people.
pg3 (last graf)
As was the case with the Debian OpenSSL vulnerability, we believe that publication is preferable to attempts to cover up weaknesses. There is a risk, but people desiring to exploit the current situation will have to redo our work, both the data collection and the calculation, as we have put our data under custody. We hope that people who know how to do the computation quickly will show good judgment.
Translated: "On the scale of crypto attacks this is so easy to do that just talking about it discloses the attack; please don't be dicks about that."
... Except for eight times e = 1 and two even e-values among the PGP RSA keys ...
... Two e-values were found that, due to their size and random appearance, may correspond to a short secret exponent (we have not investigated this).
Good reference: Boneh at http://www.ams.org/notices/199902/boneh.pdf --- under "Elementary attacks"
Looking at the owners with shared n-values among the relevant set of 266 729 X.509 certicates, many of the duplications are re-certications or other types of unsuspicious recycling of the same key material by its supposedly legal owner. It also becomes clear that any single owner may come in many different guises. On the other hand, there are also many instances where an n-value is shared among seemingly unrelated owners. Distinguishing intentionally shared keys from other duplications (which are prone to fraud) is not straightforward, and is not facilitated by the volume of data we are dealing with (as 266 729 cases have to be considered). We leave it as a subject for further investigation into this "fuzzy" recognition problem to come up with good insights, useful information, and reliable counts.
Translation: "This is not the only WTF we found in the survey."
Although 512-bit and 768-bit RSA moduli were factored in 1999 and 2009, respectively, 1.6% of the n-values have 512 bits (with 0.01% of size 384 and smallest size 374 occurring once) and 0.8% of size 768. Those moduli are weak, but still offer marginal security. A large number of the 512-bit ones were certified after the year 2000 and even until a few years ago. With 73.9% the most common size is 1024 bits, followed by 2048 bits with 21.7%. Sizes 3072, 4096, and 8192 contribute 0.04%, 1.5%, and 0.01%, respectively. The largest size is 16384 bits, of which there are 181 (0.003%).
Most certs have 1024 bit keys, but there are still semirecent 512 bit certs out there.
Generation of a regular RSA modulus consists of finding two random prime numbers. This must be done in such a way that these primes were not selected by anyone else before. The probability not to regenerate a prime is commensurate with the security level if NIST's recommendation is followed to use a random seed of bit-length twice the intended security level. Clearly, this recommendation is not always followed.
The nut graf of the paper, right?
Much larger datasets can be handled without trouble. We chose not to describe the details of our calculation or how well it scales (sapienti sat). On a 1.8GHz i7 processor the straightforward approach would require about ten core-years and would not scale well. Inclusion of the p and q-values from Section 4 and the primes from related to the Debian OpenSSL vulnerability did not produce additional results.
Sapienti sat: "a word to the wise", which I take to mean, if you understand basic RSA attacks, this stuff is pretty straightforward. Notable: the Debian OpenSSL CSPRNG bug didn't seem to contribute to this finding.
Irrespective of the way primes are selected (additive/sieving methods or methods using fresh random bits for each attempted prime selection), a variety of obvious scenarios is conceivable where poor initial seeding may lead to mishaps, with duplicate keys a consequence if no "fresh" local entropy is used at all. If the latter is used, the outcome may be worse: for instance, a not-properly-random first choice may be prime right away (the probability that this happens is inversely proportional to the length of the prime, and thus non-negligible) and miss its chance to profit from local entropy to become unique. But local entropy may lead to a second prime that is unique, and thus a vulnerable modulus.
To me the most interesting graf: here's a story about how generating a random prime is tricky even when you have a reliable source of random bits. You write a prime generator that taps entropy only after each guess. If your first guess is prime, you never tap entropy at all; you generate "correct" but insecure results.
Then, yadda dadda I skipped all the stuff about DSA and ElGamal except:
The only interesting fact we can report about ECDSA is the surprisingly small number of certicates encountered that contain an ECDSA key (namely, just one), and the small number of certicates signed by ECDSA (one self-signed and a handful of RSA keys).
Yay for progress!
The lack of sophistication of our methods and findings make it hard for us to believe that what we have presented is new, in particular to agencies and parties that are known for their curiosity in such matters. It may shed new light on NIST's 1991 decision to adopt DSA as digital signature standard as opposed to RSA, back then a "public controversy"; but note the well-known nonce-randomness concerns for ElGamal and (EC)DSA and what happens if the nonce is not properly used.
OH SNAP, NSA. This is so easy even YOU could figure it out.
Regarding nonce-randomness: this is what happened to Sony; if nonces to DSA are predictable, the whole algorithm falls.
The remaining connected
component is the most intriguing - or suspicious - as it is a complete graph on nine vertices
(K9): nine primes, each of whose
9C2 = 36 pairwise products occurs as n-value.
There's an RSA key generator out there which randomly picks two primes from a set of 9.
That is, I like the idea of someone trying to be a bit too clever and accidentally borking their code due to their unfamiliarity with math rather more than I like the idea of that kind of negligence.
Of course, who's to say that a single generator isn't responsible for more than one connected component of the graph...?
For what it's worth, my impression of that text was not as much about whether they could figure it out, but that they pretty much must have at this point given how simple it was. I don't think they were dissing the capabilities of the NSA, but simply trying to provide insight on the likelihood that this was included in them, and they concluded it almost certainly was.
Which reminds me of a conversation I was having the other day which noted that there's not much fear of the NSA scooping you as a researcher because they don't really like publishing things they find useful. :)
It's just our weird and somewhat arrogant perspective, I think. :)
Factoring one 1024-bit RSA modulus would be historic. Factoring 12720 such moduli is a statistic
Generating the primes locally with insufficient independence seems more understandable than that the same primes are used by independent parties. That hints at a more severe problem with random number generation. Regardless, assuming this is confirmed, it is a significant problem that needs to be hunted down.
If you have n1=p q1 and n2=p q2, so that the prime p is shared between them, then gcd(n1,n2) = p. That only takes a few iterations of bignum modulos to solve.
If this is a pattern of simple software flaws, or something environmental like keys generating during cold start entropy starvation, the likelihood of pulling a truly valuable key out of that soup are even lower: generally (not always, but generally) the more valuable keys are generated using very well known software, and under tight process constraints.
So basically you get ~10k random certs that allow you to MITM ~10k random sites. It's not nothing, but criminals have better ways of monetizing attacker effort.
Would be the authors' response to that.
(The paper itself makes the DSA point.)
My comment was in regards to crypto usage in general where the most problematic factor is the misuse of PRNG and seeds. (ECDSA PS3 for example).
edit: Article makes points towards a flaw in the CSPRNG and Prime generation.
Obviously, if unrelated people around the world are at random times coming up with the same (ostensibly) random gigantic prime factors, there is a randomness issue.
The point the paper makes is that this isn't necessarily a CSPRNG issue. The issue is not necessarily that people are using rand() to generate primes. It's that to generate factors for RSA, you need more than a CSPRNG: you need a prime number generator that correctly uses the CSPRNG. The paper hypothesizes one very plausible prime number generator that would be insecure even if driven off hardware entropy.
I've tried to do quick research and the method of generating random primes seems to have been added between FIPS 186-2 and FIPS 186-3. Since FIPS 186-3 only came out (approved not draft) in 2009 its probably, feel free to prove this extrapolation wrong, that these older key pairs could've been generated by an older toolkit, or in a time where this random prime generation recommendation was unknown. So once again we're bitten by these multi-year certificates.
Have they ruled out the cause as a buggy or backdoored key generator? Or running down the cause left for some future research?
Maybe it's a good idea to tag a public key with the software/version used to create it. Debian fiasco keys are easy to pick out - the entropy was so low that the bad keys can be enumerated. This new attack relies on any small weakness in the distribution of primes, and it seems quite hard to track down responsible versions of software.
I wonder how many of these shared factors aren't the actual value of p or q chosen at generation time (primality tests are probabilistic).
I'm a full-disclosure guy at heart, but why would I want to advertise to the attacker my potential weaknesses?
Wouldn't I find my own personal selfish security is maximized when I falsify that field when generating my keys?
It depends if you're the type of user who would like to rely on hearing themselves that their keys are vulnerable because of the software used or if you're the type of user who would like to be informed by an automated script that their keys are vulnerable.
I am guessing there are many in the latter category and a few in the former category but at the same time most of the people who decide these things will live in the former.
As for you, marshray, personally, I'm going to go with yes. But does that really make things worse for those who opt in to the potential for notifications? I think you can make your security tradeoff and others can make theirs and both categories of users will end up happy.
To see if it could potentially be a afflicted version of a toolkit, a scatter plot of dates and frequencies of these shared primes would provide a little more information.