It irks me that in security advisories that fix a possible backdoor—like here—sometimes no root cause analysis is done or communicated to the public. Who chose this parameter? Who wrote the code? Who committed it? So I did a little sleuthing...
The commit message reads: "Socat did not work in FIPS mode because 1024 instead of 512 bit DH prime is required. Thanks to Zhigang Wang for reporting and sending a patch." So a certain Zhigang Wang presumably chose this prime. Who is he?
I'm pretty sure that when you generate a prime you're using the Miller–Rabin primality test in which case you only probabilistically choose a prime.
In fact, the is_prime functions in openssl don't check if a number is prime. They only check that a number is prime within 1-2^-80 probability. I'm not sure what the implications are though.
This number (removed, as it seemed to cause problem in some browsers, sorry) fails the Fermat test for base 2 (i.e. 2^(p-1) is not 1 mod p). I can't believe it will pass Miller-Rabin. Edit: fails all bases up to 1000.
Assuming you are given a single number and you test it and is_prime says it is prime.
If on the other hand you are looking for a number that is_prime says is prime, and you are iterating through candidates, you need to know how likely it is to find a prime number in the first place to tell you how unlikely this is. In most cases the chance of a false positive will be much, much higher than 2^-80.
If for example, you only expected to find a true prime in 1 every 2^80 numbers anyway, there would be a 50% chance that a number you found is prime and a 50% chance it would not actually be.
There are approximately x / ln (x) primes below x. That means that there are 2^1024 / (1024 ln 2) - 2^1023 / (1023 ln 2) = 1.3e305 1024-bit primes. There are 2^1024 - 2^1023 = 9.0e307 1024-bit numbers. So, the chance that a randomly-selected 1024-bit number is prime is a little higher one in a thousand, that is, a little higher than 2^-10.
So Bayes' theorem doesn't save you here: it is still statistically unlikely that you will stumble upon a 1024-bit pseudoprime by mistake.
The standard usage in the field of crypto is that a phrase like "1024-bit prime" means a number between 2^1023 and 2^1024. For symmetric keys, yes, a "128-bit key" can start with a 0 and can even be all zeros. But for integers with mathematical properties like prime numbers, there's a big difference in the ability to e.g. factor the product of two "1024-bit primes" randomly chosen from [2^1023, 2^1024) and the product of two "1024-bit primes" randomly chosen from [0, 2^1024).
It matches the common-English usage of a phrase like "a six-figure salary." A salary of $020,000 isn't what's meant by the phrase.
You can verify this by, say, running `openssl genrsa 1024 | openssl rsa -noout -text` a few times and looking at the generated prime1 and prime2. They each have the 512th bit set. (They seem to be printed with a leading hex "00:", but there are 512/8 = 64 bytes afterwards, and the first byte always has the high bit set.)
I don't think Bayes' theorem applies very well here, because there's no practical way to check enough numbers such that the effect you're describing comes into play. 2^80 is approximately a million billion billion (10^24), so if you had a million CPUs each checking a billion numbers every second, it would still take a billion seconds (over 30 years) to check that many numbers. I suspect that's why such a small level of uncertainty was chosen in the first place.
Why would 10^24 be a million billion billion? It's 4 blocks of 6 zeros each, so that I'd expect you to call it a billion billion billion billion, if anything.
You know, I never know what to make of that logic - what if that tiny probability was exactly this one time? It's not like we saw it happen twice, and it could happen at some point. To my gut it seems you can't really know until you have other positive or negative observations.
I wonder if someone has compiled a list of very improbable events that have been observed.
For many years, in the computer science lab at the college where I sort of work, there was a "serious joke" written on the wall which said, with much better wording and some math to back it up, that the difference between a mathematician and an engineer is that the former was more concerned that a probabilistic primality test could inherently fail while the latter was more concerned that even a guaranteed algorithm was actually more likely to return the wrong answer because the computer was hit by a cosmic ray while it was determining if the number were prime.
2^-80 (~10^24) is is about as likely as being hit by a meteorite (~10^-16)[1] at exactly the same moment you are learning that you won the Powerball (~10^-8). Alternatively, it is as likely as winning Powerball in three consecutive drawings.
We happily incarcerate and (for countries that do so) execute people based on much weaker evidence. 2^-80 is absurdly small, to the point that essentially any other possibility is more likely.
You say you want to see it twice, but one data point with an error rate of 1 in 2^80 is, statistically, about a billion times more convincing than a million observations with an error rate of 1 in a million.
(That being said, there are a lot of of explanations that don't involve malice, including honest error, bugs in any of the software products involved in the process, etc. But no, I don't believe that this was a 1 in 2^80 fluke.)
There are countless other possible causes that are much more probable than M-R returning a false positive. What should we do about all of them? Start ruling them out, one by one, sure, but in which order? Common sense says we should do it in the order of decreasing prior probability. The hypothesis "M-R returned a false positive" is there somewhere, in the space of all possible hypotheses, and we'll surely encounter it after all other more likely alternatives are exhausted. I bet we don't have to go that far.
My own list is of length 1 as a category of events. Back in July (it's a bit harder now), the odds that someone's calculated sha256 sum solved a new bitcoin block were around 2^-68. This very tiny probability event was observed on average once every 10 minutes.
2^-80 is absurdly small. But that only means something useful if the number of attempts isn't absurdly large.
I get what you're saying, but 2^-80 is very VERY improbable. Hitting it once is equivalent to a merely "very unlikely" event like winning powerball happening multiple times in a row.
Wow - this has got to be my most downvoted comment. I can't edit the original anymore, so here's my update:
I guess the harsh reaction came from the fact that I didn't define the scope very well: My question wasn't in reference to M-R specifically, just in general. I understand that in this case it makes sense to look at likelier causes (see Sharlin's response).
My point was that it's interesting to look at what happens (or what our reaction is) when very very very improbable events do happen. It seems weird to go with the assumption that because something is extremely unlikely that it won't happen.
When I roll a dice 20 times, I get a particular arrangement of numbers. Given the total number of arrangements possible, that particular arrangement is extremely unlikely, yet I just got it.
A guy got struck by lightning 7 times (https://en.wikipedia.org/wiki/Roy_Sullivan). The odds of any person getting struck by lightning is 1 in 10000. Seven times in a row is 1 in 2^93. But then when you start drilling down, you see that he's a park ranger, and that he's out while lightning happens, which makes the probability that he'll get struck much higher.
If I had phrased the question to you asking what the likelihood of any given person in the world being struck seven times was, you could have calculated the former and said 2^-93 is such a small probability that it's not worth thinking about - and yet here is Roy Sullivan, so there's some sort of conflict in my logic. What's wrong with the former calculation?
Why is it that for any given person the probability is 2^-93 but for Roy it's somehow different, even though he is a "given person"? Is it that the 1 in 10000 number was wrong? But then if we look at all the people who never once got struck, it seems about right. If we inflate that number to 1 in 100 to make Roy likelier to get 7 in a row, then it seems everyone also should be getting shocked more often at least once or twice.
Or maybe it's that somehow the probability changes when we have more information and those two numbers and situations are not comparable on an absolute scale. Maybe if you get hit twice then you're much likelier to get hit again because you're probably in some dangerous location - but how was I to know to factor this in? It seems that it's very much about how you calculate the probability. Who knows what other hidden factors could be wildly affecting the true value of the probability?
That also makes me think - is there even such a thing as the "true" or inherent probability of an event happening?
edit: Or maybe it's the law of large numbers - given enough "trials" or in this case lightning events with people around, even something with an absurdly small probability is bound to happen eventually. But then why do we never factor that in and always just call it a day with 10000^7?
The catch with your logic is that "The odds of any person getting struck by lightning is 1 in 10000" is sloppy phrasing. Of the number of people studied, one ten-thousandth of them got struck by lightning, but lightning does not, as you have observed, choose people at random among those 10,000. Some people are constantly indoors; some are constantly outdoors. Some live in places with frequent thunderstorms; some don't. The only way to get that 1/10000 probability is to pick a person randomly. Picking Roy Sullivan isn't random; neither is picking me.
"The probability of a randomly-selected 1024-bit number passing this primarily test without being prime is 2^-80" is a much more precise statement, because you have selected that number randomly. Obviously, if you have a specific number in mind, the probability of that number being a false positive is either 0 or 1. It either is prime, or it isn't.
Remember that there is no such thing as a random number; there is only a randomized process for selecting numbers.
> Why is it that for any given person the probability is 2^-93 but for Roy it's somehow different, even though he is a "given person"?
Because it isn't true that the probability of any given person being struck by lightning is 1/10000. For instance, the internet says that men are around 4 times as likely to be struck by lightning as women. Rather, what's true is that the likelihood of a randomly selected person being struck by lightning is 1/10000. Each individual person has a different likelihood to be struck. I don't know how to calculate it for Roy, but given that he's been struck 7 times I'm sure it's way higher than 1/10000.
> That also makes me think - is there even such a thing as the "true" or inherent probability of an event happening?
I love when people email me after reaching an enlightenment.
There was a probability question that drove me nuts until I figured out what was going on. The question is "A woman has two children. One of her children is a boy. What is the probability that she has two boys?".
The question isn't well defined, because it matters how you learned that one of her children is a boy.
If you asked her "what is the gender of your oldest child?", and she says it's a boy, then the probability that she has two boys is 1/2.
But if you asked her "do you have at least one boy?", and she says yes, then the probability that she has two boys is 1/3.
I don't know what the lesson here is. Probability is hard? Always ask about the experiment?
> is there even such a thing as the "true" or inherent probability of an event happening?
The probability of something depends on what you know. What's the chance that Roy will be struck by lightning tomorrow? If you don't know who Roy is, you might say 1/(10000 * 28000) (where 28000 days is the average human lifespan.). If you do know Roy's history, you'll probably bump that estimate up quite a bit. But if you look up the weather in his park tomorrow and see that it will be sunny all day, the probability will go back down close to zero.
Some things are so hard to know they might as well have an inherent probability, though. If you roll a die, no one's going to predict the outcome of the roll, so we might as well say it's inherently got a 1/6 chance of rolling a 5. And if you shine a photon at a half-silvered mirror, it's actually impossible to predict which way it will go, so I guess that really does have in inherent probability.
You've got to watch out for your probability estimates being wrong, though. I thought the probability of my friend flipping a coin and getting heads was 1/2, until he demonstrated that he could flip heads 10 times in a row.
> edit: Or maybe it's the law of large numbers - given enough "trials" or in this case lightning events with people around, even something with an absurdly small probability is bound to happen eventually. But then why do we never factor that in and always just call it a day with 10000^7?
Because even with 10 billion people, each living a billion days, and having a billion things happen to them in a day... well coincidentally that's exactly 10^28=10000^7, so never mind.
The likelihood of 1/2^80 is on the same order as me picking out a thousand grains of sand from the Sahara desert, spreading them randomly out through the desert and having you pick out those exact thousand grains of sand.
It's a practical impossibility, a philosophical exercise.
No, it's not the same order. 1/2^80 is the same order as the likelihood of as picking 1000 grains out of 1010, and I'm sure you'll agree that Sahara has more grains that 1010.
However, if you picked a single grain from Sahara desert, you'll be only 2 or 3 orders of magnitude off, so one could say that 2^-80 is only slightly easier than finding a particular grain in a Sahara desert.
Given that there is so much probability in the world, you would think you would want to be certain about things you can be objectively 100% certain about, especially for one off long term choices that have big implications.
2^-80 is orders and orders of magnitude less likely than the chance of a bit error in RAM when performing the calculation. Or of fucking up the implementation of such a check. Or of a CPU error.
An attacker who can bruteforce, say, 2^80 128-bit keys (approximate limit of the computational power of the largest adversaries) has 1 chance out of 2^48 to break the security.
But an attacker has only 1 chance out of 2^80 that this parameter is a non-prime.
2^80 is much larger than 2^48, therefore it is not a problem.
If I want a 128-bit security level, I'm willing for a random guess of my key to have a 2^-128 chance of being correct. I'm likewise willing for a prime I generate to have a 2^-128/uses chance of not being prime (where uses is equal to the number of times I'll actually be using it).
I can't be 100% certain that a large number is actually prime. But I can be certain enough. Having a 2^-80 chance of being wrong isn't good enough when I want a 2^-128 security level.
Apples and oranges. When we talk about 128-bit security, we mean that it takes ~2^128 work to break it; not that there's a 2^-128 chance that it is broken.
Would it be reasonable to say that the following scenarios are more probable?
* The specific version of the software that was used to generate the non-prime contained a bug in its implementation of the algorithm
* The specific hardware upon which the software was run was flawed in a way that generated an incorrect output number from the program (i.e. on-disk corruption, RAM corruption, etc., undetected by the checksums in the hardware)
Both of those scenarios are more probable. But neither is anywhere nearly as probable as human error or human malice.
In the case of human error, it could be as simple as a bad cut-and-paste. In that case the number likely has a half-dozen or so factors, some of which should be found relatively easily.
It would be fantastic to see this fix accompanied by a patch that adds a test that runs that primality check again.
It's a check that probably doesn't need to run in CI over and over again, but boy would it be nice to have it codified so everyone reading now could reproducibly run the check if we felt the desire.
There's one in a thousand chances and there's 1-2^-80 chances. Wouldn't that be some "electrons in the universe" level coincidence? Believing in that borders on religion.
For anyone following along at home, you missed a D at the very end (0x6D). A number whose hexadecimal representation ended in 6 couldn't be prime because it is even.
(final digit is divisible by d ←→ number is divisible by d works for any d that is a divisor of the base the number is written in)
In that Zhigang explicitly states that "I don't have enough knowledge to implement the merge." Yet the code is accepted without the slightest review whatsoever.
* subgroup confinement attacks (where you send a public key made with a fake generator g) should be able to take place if the code is weak -> this is because there must be low order subgroups.
* the generator g might not be of great order. This can be easily tested if you know how to factor p: the order of the multiplicative group (Zp)* is the euler's totient function on p. If you know the order of the multiplicative group then you have an algorithm to find the order of your generator: you try all the divisors of the group's order, see the smallest one that works.
Unfortunately, if you don't know how to factor p then you can't easily do that.
Another question is: How can they know it's not a prime if they don't know the factorization of p? We have efficient provable tests for that: they tell you if p is prime or not and nothing else.
The way this post is written it sounds like you are asking these questions but then you go on to answer them. It's confusing. It took me a second read to realize you are actually not calling the weakness in to question but explaining why it really is a weakness.
I hate to say that a polynomial-time test isn't efficient because many people use that as the very definition of efficient, but my understanding is that AKS is incredibly impractical because the polynomial is ginormous, even though its asymptotic behavior is nice. So if you actually wanted to know if, say, a 1024-bit number was definitely prime, you wouldn't be able to run AKS on it in a "reasonable time" on a real computer.
Also, to further the discussion on probable vs provable: the probable tests are enough in our case because they tell us _provably_ if an integer is not a prime (that we care), but _probably_ if an integer is a prime (which we don't care here).
Just because there are exact tests doesn't make the AKS primarily test usable in the real world because of constant factors . Using Fermats little theorem for primality testing is fine because the probably of encountering a Carmichael number is very low.
You're right, AKS is not a very efficient algorithm and randomized tests are generally good enough. But, there are exact tests which do run fast in practice, such as ECPP. Here's some of the largest primes found using ECPP: http://primes.utm.edu/top20/page.php?id=27
Note the largest there is 30,950 digits, which is about 102,813 bits unless I did my math wrong, so I bet it is usable for 1024-bit numbers. Non-exact methods are much much faster of course, but when you really really need exactness it is an option.
Seems like a Carmichael number could be a good choice for an attacker. This would make your chances of encountering a Carmichael number quite high. You have to consider who's choosing the number. Or do I have it wrong?
Only the simple Fermat tests are vulnerable to Carmichael numbers. The test which is used in practice, Rabin-Miller, is not vulnerable to them. In fact, if you assume the extended Riemann hypothesis is true, (log n)^2 iterations of Rabin-Miller are sufficient to prove primality.
How efficient might this be compared to other approaches for searching for Riemann hypothesis counterexamples? Is there a one-to-one relationship between direct Riemann counterexamples and composite numbers that pass (log n)² Rabin-Miller tests?
I am not familiar with all the ways the GRH can be disproven, but Miller-Rabin doesn't strike me as the most efficient, with its running time of O((log n)^4) for each attempt. I am not sufficiently competent to say whether a GRH counterexample also directly results in a MR counterexample.
In addition to what pbsd said, Carmichael numbers are incredibly rare. Your odds of stumbling across one by chance are lower than the odds of a cosmic ray flipping the bit of memory that causes you to believe you have stumbled across one, as I recall.
So I'd say that's probably what happened here (i.e p was reported composite by a randomized primality test and that would never have happened had it been prime).
Oh sure, I don't mean that there's uncertainty that this number is composite, just that in a formal sense there isn't "proof" that other numbers that passed, say, openssl -checks 100000 are prime. But I wouldn't consider it unsafe to use them for cryptography.
The comment I was replying to referred to tests that "tell you if a number is prime or not", and the probable ones don't exactly always do that. :-)
The fact of the matter is that the sort of Platonic definition of proof that you fantasize is impossible because it depends on certainty, which itself cannot be proven. The best we can do is fail to find an error in an alleged proof.
Consider the proofreader's paradox: A proofreader can be prepared to swear, for any given page of a 1000 page book, that there are no typos on that page, but no proofreader is so foolish as to claim that there are no typos in the book.
>> Unfortunately, if you don't know how to factor p then you can't easily do that.
The link says they don't know where p came from. Presumably someone constructed it as a product of primes known only to them. I don't recall the state of the art in factorization, but if 1024 bits can be factored easily that's news to me. So the weakness would only be exploitable to whomever created p.
One of the factors is 3684787 = 271 x 13597, so I suspect this is more accidental than malicious. The generator, 2, does not seem to have pathologically small order, but I didn't check very far.
Sage also has an is_prime function, which provably show that p is not prime: "is_prime(p)". This takes less than 1ms to run. Proving primality of a 1024 bit prime in Sage takes a few seconds, and for a 2048 bit prime about a minute. Sage uses PARI for this, with sophisticated elliptic curve based algorithms called ECPP("=elliptic curve primality proving"), which are non-deterministic (unlike AKS) but fast in practice and provably correct. https://goo.gl/34uxJl
You're correct that the ring of integers mod n with n composite will have small multiplicative subgroups. But so will the integers mod p with p prime. At the very least, 1 and p-1 will always have orders 1 and 2, respectively. I could be wrong, but I don't think the primality or compositeness of the modulus alone tells you much about the smoothness of the group order.
Small subgroups may not always lead to key recovery, but they can lead to key dictation in certain protocols. So subgroup-confinement attacks are always a consideration in this setting.
If you want to learn more about subgroup-confinement attacks on DH and ECDH, check out set 8 of cryptopals. Mail set8.cryptopals@gmail.com with subject "Crazy Flamboyant for the Rap Enjoyment".
But once you have the factorization for p, since it's hardcoded, it's now much easier to break every DH key exchange used by this application. Getting that factorization would be very very difficult, but once you have it you can use it on everyone.
if it's not an easy factorization => it will be hard. According to recent results we believe state-sized adversary should be able to do it. If you're threat model is against criminals, then you might be OK.
EDIT: if 1024bits factorization is easy in general, you can say goodbye to every 1024bits RSA modulus. My first statement doesn't mean it can never be easy, it means that if you try and factor it the easy way and it doesn't work... you are in for a lot of work and research.
Well, it isn't known who provided the number in the first place. Whoever that unknown party is presumably knows the factorization they used to construct p. This could be a state, or could be a criminal enterprise. Better not to trust it!
> it's also pretty laughable to think that the NSA has an exhaustive list of encryption vulnerabilities
Why not? They have enough mathematicians and cryptographers on payroll that they can analyse the major protocols and software that is used. If you look at the recent attacks on TLS - especially the downgrade attacks to export grade ciphers they would be stupid to not exploit this for targeted attacks.
That being said I doubt they'll have very good mathematical attacks as a lot of researchers look at this stuff but implementation bugs, side channel attacks and plain bugs are plenty.
most of the world runs openssl or CryptoAPI - why not spend a few man years looking through these implementations?
Well at least if I where them I'd have some programmers to come up with automatic bug finding techniques and fuzz the hell out of existing applications.
They also likely don't care about your encrypted facebook chat but government infrastructure, mobile telecommunication providers and lots of important enterprises are probably pretty often on their "we need to get in list" if you can decrypt captured encrypted traffic - I imagine this would be a huge net win.
Because coming up with an exhaustive list of all the encryption software in the world would be difficult, let alone enumerating all of their vulnerabilities.
For practical purposes it should suffice if they create an exhaustive list of the encryption software that is used, say, in 99% of all cases. Finding those and documenting some vulnerabilities of each is quite a realistic goal.
It's very easy to detect those kind of fuck ups, you just have to look for them. We don't have the mean to do it, and sometime we are just ignorant or lazy. They are neither.
We assume actual human beings need to press buttons to detect obvious developer errors. I bet if an encrypted communication with any kind of bad (or even just popular default) parameter goes through anything the NSA oversees it gets instantly attacked and put in a bin somewhere with your new software name on it. The machine probably even picks a random name for the exploit once it found one.
The NSA is old enough, and well resourced enough. Time and money solve these problems readily enough. In the case of FOSS, all you really need is a parser designed to hunt down specific sorts of flaws. We have static analysis for non-crypto needs, it seems reasonable someone at NSA got funded to write one for their use case.
If they have a complete IP traffic recording facility, they will be applying automatic classification to it and looking to make the classification as complete as possible. Anything that doesn't fit the existing categories will attract attention.
> While it's smart to assume so, it's also pretty laughable to think that the NSA has an exhaustive list of encryption vulnerabilities.
True, but in this case I'm sure they have enough hardware to factor any widely deployed primes used in crypto or semi crypto comms software. After all, that is half of the NSAs job description.
Title is misleading -- this appears to be an issue with a tool called socat, not with OpenSSL. That's a world of a difference in the severity of the issue.
The original phrasing is correct, but confusing if you aren't familiar with socat. Socat fundamentally works by giving it a pair of addresses; they are referring to socat's implementation of addresses starting with "OPENSSL:" (caps-insensitive), AKA "OpenSSL addresses".
This is a vulnerability in socat's TLS support. It has nothing to do with OpenSSL (besides the fact that OpenSSL provided a footgun API by leaving it to application developers to supply DH parameters).
Its an interesting challenge though, I wonder if the person who picked the constant the first time understood the ramifications of it being prime or not. And if they did, how hard they worked to validate its primality.
If you happen to know the factorization and the factors are not too large (e.g. two 500-bit factors + some chaff), then you can just use Pohlig-Hellman algorithm to solve the DLP modulo each individual factor, combine the results and recover the shared Diffie-Hellman secret.
But without this trapdoor information (and, say, if p was chosen to be a Blum integer), computing the Diffie-Hellman shared secret is as hard as factoring that modulus (see https://crypto.stanford.edu/~dabo/abstracts/DHfact.html).
Someone should really write a set of unit tests available in every conceivable language, marked as "Please copy this unit test into your test base, and use it to verify all your primes are prime".
You can even make it a bit fuzzy- Miller-Rabin uses random numbers, right? So make it that every time the unit test is run it generates new random values. Your test won't be deterministic, but it will fail at least some of the time which should be enough to raise an alarm of a problem.
I'm aware that factoring a prime into composites is one of the most difficult computational problems, but isn't it cheap to determine if a number is prime ? a^(p-1) == a mod p (where == means "congruent to" ). with modular exponentiation isn't it simple to compute the modulus ?
The little Fermat test requires you to test all "a"s in order to remove false positives. For most numbers, testing a few thousand bases is enough but you have Carmichael numbers that break the test.
There are better primality checks, but they all have downsides (either slow but provable or fast but probabilistic). Finding prime numbers can be shown to be easier than finding factors (see: how GIMPS checks for primality), but that doesn't make it easy.
isPrime(p) is polynomial relative to the number of digits of p, but not in any useful way- takes way too long practically speaking. But that's only if you want a definite solution.
If you're willing to do a probabilistic test, you can use algorithms like Miller-Rabin- the more time you use it with random inputs the more sure you are that the number is prime. And it's relatively easy to implement- check Wikipedia for the pseudocode.
Nice catch. Now, the real question is who committed it, and how they came up with the number.
If it is a backdoor, it's pretty smart, because it's very deniable as a "stupid mistake". And if it's a stupid mistake, it's extra stupid, because this committer will have trouble convincing the world that that's all it was. At the very least, somebody needs to be going through all this person's commits with a fine-toothed comb.
The methods of handling situations like this in the face of a known threat is going to be interesting. You hate to ban or hinder a good programmer from a project, but once possibly-bitten, twice shy.
Aren't large non-primes usually created by multiplying two large but smaller primes together. Factoring is then the challenge. Or is there more to this that I'm missing?
a) knew it was non-prime, and used it to weaken the crypto
b) knew it was non-prime, and used it because they didn't think it needed to be prime (which is a massive sin of ignorance)
c) grabbed 1024 bits of rand() and didn't check if it was prime (again, stupid)
d) grabbed some rand and checked the prime-ness using a bad method
e) used a "prime number generator" that produced bad output
I agree that making non-prime numbers is not terribly difficult, but the question of how they got the number is only interesting in that it gives info about why.
A bit could have been flipped in the software or the result of the function (true/false), but in the number itself there appear to be no single bit flips that make it prime (at least in the binary representation).
Edit: A single bit flip could have been used as "semi" plausible deniability in the case of malicious intent.
it is the same clear path as unix vs closed source research. perhaps there will be some achievements privately, but all mind and matter is interco-mingled already and forever. secrets are just another way of saying "obscure path" ... feels?
>there is no indication of how these parameters were chosen
Is there really no protocol used in projects undertaken by the security community that would ensure that each component of the tools we rely on has a known history?
Well, socat has a git repo at git://repo.or.cz/socat.git so you can see the history of the prime number. In particular, it was upgraded from a 512 bit prime to a 1024 bit "prime" in commit 281d1bd on Jan 23, 2015.
Neither the code then, or the new code in this patch, have comments indicating how the prime was generated. (It is only mentioned in the advisory that openssl dhparams was used for the recent patch)
Might be an interesting exercise to search through github for large numeric constants and find ones that aren't prime, then manually check through those for ones that are supposed to be.
The original error could have been checked with a quick code review verifying that the provided number was in fact prime. This should have been done when the original patch was submitted.
I've tested some of them with PARI/GP. There is no probable prime with the Hamming distance 1. There are several (probably 3--400, haven't exhaustively listed) probable primes with the Hamming distance 2 (p ^ (1<<30) ^ (1<<14) is one example).
Well, the provided command takes 2.5 min to for pari to test the given "prime" on my computer. There are 16*256 numbers with a single hex digit mutation from the number given.
The change log message that introduced the new constant is interesting, too. "Socat did not work in FIPS mode because 1024 instead of 512 bit DH prime is required. Thanks to XXX for reporting and sending a patch."
> The power of the Baillie-PSW test comes from the fact that these lists of strong Fermat pseudoprimes and strong Lucas pseudoprimes have no known overlap. There is even evidence that the numbers in these lists tend to be different kinds of numbers.
A primality certificate could be included that allows the prime to be verified quickly during the build. This certificate for the 2048-bit prime took 90 seconds to generate, and can be verified in 3 seconds.
The primality test of the original number using gp that archgoon posted takes 6ms on my Sandy Bridge laptop, that's plenty fast. (The new number takes a minute to check, though, so that might really be too long)
Maxima's primep takes 360ms on the 1024-bit one for me, so seems quick enough to stick into the build.
It sounds like the chance if it passing the test is around 10^-15 using the default values; which seems sufficient for a build test which is run many times.
Kind of interesting that the first nonprime version ended in a comma, as if it originated as a fragment of a longer array of bytes. That's nonstandard C, and I'd expect that whatever tool(s) generate those arrays would know not to add a comma after the final entry.
<cynical thought>That's a nice piece of plausible deniability, huh? I wonder if the committer has a few extra bytes that he can claim were supposed to be there which make a number that passes all the primality tests? That'd be a nice excuse for the NSA/3PLA overlords to have given him...
(Actually I may be confusing the extra comma here with an extra comma at the end of an enum list. Some compilers are OK with that, while others aren't.)
If I'm reading N1256 right (not sure how far back it actually dates, but that's a C standard draft dated 2007), section 6.7.2.3 also explicitly allows a trailing comma in that context as well, so if such a compiler exists its makers should get with the program, so to speak.
For example, there are about 2⁸⁰ primes less than 10²⁶, that is, with 26 or fewer digits. (Where could you get 2⁸⁰ bits of storage, even if you could do the huge computation necessary to check each of these?) 1024-bit primes are about 308 digits long.
You can imagine trying to get an intuition for a few different kinds of things:
* Up to what point has the primality of every number been checked?
* What is the largest general-form number whose primality has been checked conclusively? (or, what is the largest-known general-form prime?)
* What's the largest general-form semiprime that has been factored into primes without foreknowledge of the factors?
* How big are the primes we use in cryptography?
* How big are the largest-known (special-form) primes?
These things are very different orders of magnitude. I'm not sure of the exact answers to the first two, but I expect the first is below 20 digits (the π(n) calculations apparently didn't check all the individual numbers). The answers to the last 3 are: 232 digits, 308 to 1233 digits, and 22338618 digits.
Pi would be a horrible source. Why would you want to use a deterministic digit generation function to generate your entropy. Even if you always used very large digit offsets. I can't imagine it being a remotely good idea.
Pi is transcendental, which means that it is not a root of a non-zero polynomial with rational coefficients.
Being transcendental does not imply that a number's expansion in a given base must include every digit string. Consider the number 1/10^1! + 1/10^2! + 1/10^3! + 1/10^4! + ....
This number, whose decimal expansion is 0.110001000000000000000001... is transcendental (proven by Liouville in 1844). Its decimal expansion clearly does not contain every decimal number. It only contains the digits 0 and 1, and after the first two places never even contains consecutive 1s.
It is known that "almost all" real numbers do in fact contain in their base b expansion every sequence of base b numbers, each sequence occurring with frequency proportional to its length. These are called "normal" numbers. Very few interesting numbers (where "interesting" means that we have some reason to be interested in aside from their normality) are known to be normal, though.
By "interesting" I mean a number that arises out of something else that one might be interested in.
For instance, consider pi. If you are interested in geometry, pi will turn up. If you are interesting in number theory, pi will turn up (e.g., it is connected to zeta functions). If you are interested in probability and statistics, pi will turn up. If you are interested in differential equations, pi will come to the party.
If you somehow have never encountered pi, I can convey it to you by telling you about one of those things. For instance, I could tell you that it is the period of the non-zero solutions of the differential equation y'' + y = 0.
An uninteresting number would be one that has no known connection to other things. If I have a particular uninteresting number, and I want to convey it to you, I'll have to just tell you the number.
A random number would almost certainly be uninteresting, such as this hex fraction, which came from /dev/urandom on my computer: 0.bfdab557104bf2d8952fb1ea0adfd732794a353d5b35d95cda927f4ad8f6dd11f11b2e968298. It is extremely unlikely that anyone has ever seen that number before. The only known thing interesting about it is that it was made specifically as an example of a number that is not otherwise interesting.
Pi is transcendental, but that has nothing to do with "containing every number". A transcendental number is defined as a number which is not the root of any non-zero polynomial with rational coefficients. The two properties are not related. For example, the first example of a transcendental number (Louisville numbers) isn't capable of being base-10 normal (it only contains the digits 0 and 1).
The property you're referring to is related to normality. A normal number in a base b is a number where the frequency of digits in that base approaches 1/b, but is not a rational number (and thus does not have cycles). Pi has not been proven to be normal, but if it were then it would have the property of which you speak (which is an informal property provided by normality).
It's interesting that all the time no one noticed it was not actually prime. This leans skepticism towards assumptions that widely used security critical open source code is reviewed by anyone competent at evaluating it, even over long periods of time.
Here is the commit introducing the non-prime parameter (committed by Gerhard Rieger who is the same socat developer who fixed the issue today): http://repo.or.cz/socat.git/commitdiff/281d1bd6515c2f0f8984f...
The commit message reads: "Socat did not work in FIPS mode because 1024 instead of 512 bit DH prime is required. Thanks to Zhigang Wang for reporting and sending a patch." So a certain Zhigang Wang presumably chose this prime. Who is he?
Apparently he is an Oracle employee involved with Xen and Socat. Here is a message he wrote: http://bugs.xenproject.org/xen/bug/19
So why has Gerhard seemingly not asked Zhigang how he created the parameter?