Socat: “the hard coded 1024 bit DH p parameter was not prime”

mrb · on Feb 1, 2016

It irks me that in security advisories that fix a possible backdoor—like here—sometimes no root cause analysis is done or communicated to the public. Who chose this parameter? Who wrote the code? Who committed it? So I did a little sleuthing...

Here is the commit introducing the non-prime parameter (committed by Gerhard Rieger who is the same socat developer who fixed the issue today): http://repo.or.cz/socat.git/commitdiff/281d1bd6515c2f0f8984f...

The commit message reads: "Socat did not work in FIPS mode because 1024 instead of 512 bit DH prime is required. Thanks to Zhigang Wang for reporting and sending a patch." So a certain Zhigang Wang presumably chose this prime. Who is he?

Apparently he is an Oracle employee involved with Xen and Socat. Here is a message he wrote: http://bugs.xenproject.org/xen/bug/19

So why has Gerhard seemingly not asked Zhigang how he created the parameter?

matthewaveryusa · on Feb 1, 2016

I'm pretty sure that when you generate a prime you're using the Miller–Rabin primality test in which case you only probabilistically choose a prime.

In fact, the is_prime functions in openssl don't check if a number is prime. They only check that a number is prime within 1-2^-80 probability. I'm not sure what the implications are though.

See https://www.openssl.org/docs/manmaster/crypto/BN_generate_pr...

ninguem2 · on Feb 1, 2016

This number (removed, as it seemed to cause problem in some browsers, sorry) fails the Fermat test for base 2 (i.e. 2^(p-1) is not 1 mod p). I can't believe it will pass Miller-Rabin. Edit: fails all bases up to 1000.

nathancahill · on Feb 1, 2016

Congrats, you broke HN.

ninguem2 · on Feb 1, 2016

What, how?

nathancahill · on Feb 1, 2016

Stick the number in a code block (4 spaces at the beginning of the line). The CSS here doesn't handle long strings well.

defen · on Feb 1, 2016

2^-80 is an incomprehensibly tiny number. Malice or incompetence are both FAR more likely.

mehwoot · on Feb 1, 2016

Assuming you are given a single number and you test it and is_prime says it is prime.

If on the other hand you are looking for a number that is_prime says is prime, and you are iterating through candidates, you need to know how likely it is to find a prime number in the first place to tell you how unlikely this is. In most cases the chance of a false positive will be much, much higher than 2^-80.

If for example, you only expected to find a true prime in 1 every 2^80 numbers anyway, there would be a 50% chance that a number you found is prime and a 50% chance it would not actually be.

https://en.wikipedia.org/wiki/Bayes%27_theorem

geofft · on Feb 2, 2016

There are approximately x / ln (x) primes below x. That means that there are 2^1024 / (1024 ln 2) - 2^1023 / (1023 ln 2) = 1.3e305 1024-bit primes. There are 2^1024 - 2^1023 = 9.0e307 1024-bit numbers. So, the chance that a randomly-selected 1024-bit number is prime is a little higher one in a thousand, that is, a little higher than 2^-10.

So Bayes' theorem doesn't save you here: it is still statistically unlikely that you will stumble upon a 1024-bit pseudoprime by mistake.

tika · on Feb 3, 2016

"There are 2^1024 - 2^1023 = 9.0e307 1024-bit numbers"

This is the count of 1023 bits numbers [(x - x/2) = x/2]. 0 is a valid leftmost digit.

geofft · on Feb 3, 2016

The standard usage in the field of crypto is that a phrase like "1024-bit prime" means a number between 2^1023 and 2^1024. For symmetric keys, yes, a "128-bit key" can start with a 0 and can even be all zeros. But for integers with mathematical properties like prime numbers, there's a big difference in the ability to e.g. factor the product of two "1024-bit primes" randomly chosen from [2^1023, 2^1024) and the product of two "1024-bit primes" randomly chosen from [0, 2^1024).

It matches the common-English usage of a phrase like "a six-figure salary." A salary of $020,000 isn't what's meant by the phrase.

You can verify this by, say, running `openssl genrsa 1024 | openssl rsa -noout -text` a few times and looking at the generated prime1 and prime2. They each have the 512th bit set. (They seem to be printed with a leading hex "00:", but there are 512/8 = 64 bytes afterwards, and the first byte always has the high bit set.)

brianberns · on Feb 1, 2016

I don't think Bayes' theorem applies very well here, because there's no practical way to check enough numbers such that the effect you're describing comes into play. 2^80 is approximately a million billion billion (10^24), so if you had a million CPUs each checking a billion numbers every second, it would still take a billion seconds (over 30 years) to check that many numbers. I suspect that's why such a small level of uncertainty was chosen in the first place.

anon4711 · on Feb 2, 2016

Why would 10^24 be a million billion billion? It's 4 blocks of 6 zeros each, so that I'd expect you to call it a billion billion billion billion, if anything.

jibalt · on Feb 3, 2016

1 million = 10^6. 1 billion = 10^9. 6+9+9 = 24.

This isn't rocket science.

mchahn · on Feb 3, 2016

Marilyn Vos Savant got a question on her column I'll never forget. The answer was awesome ..

Q: My friend got a perfect hand in bridge. What are the odds of this happening?

A: Your friend is a liar.

makmanalp · on Feb 1, 2016

You know, I never know what to make of that logic - what if that tiny probability was exactly this one time? It's not like we saw it happen twice, and it could happen at some point. To my gut it seems you can't really know until you have other positive or negative observations.

I wonder if someone has compiled a list of very improbable events that have been observed.

saurik · on Feb 1, 2016

For many years, in the computer science lab at the college where I sort of work, there was a "serious joke" written on the wall which said, with much better wording and some math to back it up, that the difference between a mathematician and an engineer is that the former was more concerned that a probabilistic primality test could inherently fail while the latter was more concerned that even a guaranteed algorithm was actually more likely to return the wrong answer because the computer was hit by a cosmic ray while it was determining if the number were prime.

lultimouomo · on Feb 2, 2016

Probably a quote from SICP:

https://mitpress.mit.edu/sicp/chapter1/footnode.html#2413

xyzzyz · on Feb 1, 2016

2^-80 (~10^24) is is about as likely as being hit by a meteorite (~10^-16)[1] at exactly the same moment you are learning that you won the Powerball (~10^-8). Alternatively, it is as likely as winning Powerball in three consecutive drawings.

[1] - https://what-if.xkcd.com/19/

frgewut · on Feb 2, 2016

A person was hit by a meteorite [1]

[1] http://www.telegraph.co.uk/news/science/space/5511619/14-yea...

Lazare · on Feb 1, 2016

We happily incarcerate and (for countries that do so) execute people based on much weaker evidence. 2^-80 is absurdly small, to the point that essentially any other possibility is more likely.

You say you want to see it twice, but one data point with an error rate of 1 in 2^80 is, statistically, about a billion times more convincing than a million observations with an error rate of 1 in a million.

(That being said, there are a lot of of explanations that don't involve malice, including honest error, bugs in any of the software products involved in the process, etc. But no, I don't believe that this was a 1 in 2^80 fluke.)

Sharlin · on Feb 1, 2016

There are countless other possible causes that are much more probable than M-R returning a false positive. What should we do about all of them? Start ruling them out, one by one, sure, but in which order? Common sense says we should do it in the order of decreasing prior probability. The hypothesis "M-R returned a false positive" is there somewhere, in the space of all possible hypotheses, and we'll surely encounter it after all other more likely alternatives are exhausted. I bet we don't have to go that far.

Jach · on Feb 1, 2016

My own list is of length 1 as a category of events. Back in July (it's a bit harder now), the odds that someone's calculated sha256 sum solved a new bitcoin block were around 2^-68. This very tiny probability event was observed on average once every 10 minutes.

2^-80 is absurdly small. But that only means something useful if the number of attempts isn't absurdly large.

eli · on Feb 1, 2016

I get what you're saying, but 2^-80 is very VERY improbable. Hitting it once is equivalent to a merely "very unlikely" event like winning powerball happening multiple times in a row.

makmanalp · on Feb 2, 2016

Wow - this has got to be my most downvoted comment. I can't edit the original anymore, so here's my update:

I guess the harsh reaction came from the fact that I didn't define the scope very well: My question wasn't in reference to M-R specifically, just in general. I understand that in this case it makes sense to look at likelier causes (see Sharlin's response).

My point was that it's interesting to look at what happens (or what our reaction is) when very very very improbable events do happen. It seems weird to go with the assumption that because something is extremely unlikely that it won't happen.

When I roll a dice 20 times, I get a particular arrangement of numbers. Given the total number of arrangements possible, that particular arrangement is extremely unlikely, yet I just got it.

A guy got struck by lightning 7 times (https://en.wikipedia.org/wiki/Roy_Sullivan). The odds of any person getting struck by lightning is 1 in 10000. Seven times in a row is 1 in 2^93. But then when you start drilling down, you see that he's a park ranger, and that he's out while lightning happens, which makes the probability that he'll get struck much higher.

If I had phrased the question to you asking what the likelihood of any given person in the world being struck seven times was, you could have calculated the former and said 2^-93 is such a small probability that it's not worth thinking about - and yet here is Roy Sullivan, so there's some sort of conflict in my logic. What's wrong with the former calculation?

Why is it that for any given person the probability is 2^-93 but for Roy it's somehow different, even though he is a "given person"? Is it that the 1 in 10000 number was wrong? But then if we look at all the people who never once got struck, it seems about right. If we inflate that number to 1 in 100 to make Roy likelier to get 7 in a row, then it seems everyone also should be getting shocked more often at least once or twice.

Or maybe it's that somehow the probability changes when we have more information and those two numbers and situations are not comparable on an absolute scale. Maybe if you get hit twice then you're much likelier to get hit again because you're probably in some dangerous location - but how was I to know to factor this in? It seems that it's very much about how you calculate the probability. Who knows what other hidden factors could be wildly affecting the true value of the probability?

That also makes me think - is there even such a thing as the "true" or inherent probability of an event happening?

edit: Or maybe it's the law of large numbers - given enough "trials" or in this case lightning events with people around, even something with an absurdly small probability is bound to happen eventually. But then why do we never factor that in and always just call it a day with 10000^7?

geofft · on Feb 2, 2016

The catch with your logic is that "The odds of any person getting struck by lightning is 1 in 10000" is sloppy phrasing. Of the number of people studied, one ten-thousandth of them got struck by lightning, but lightning does not, as you have observed, choose people at random among those 10,000. Some people are constantly indoors; some are constantly outdoors. Some live in places with frequent thunderstorms; some don't. The only way to get that 1/10000 probability is to pick a person randomly. Picking Roy Sullivan isn't random; neither is picking me.

"The probability of a randomly-selected 1024-bit number passing this primarily test without being prime is 2^-80" is a much more precise statement, because you have selected that number randomly. Obviously, if you have a specific number in mind, the probability of that number being a false positive is either 0 or 1. It either is prime, or it isn't.

Remember that there is no such thing as a random number; there is only a randomized process for selecting numbers.

Jach · on Feb 2, 2016

You've touched upon an old conflict. http://lesswrong.com/lw/oj/probability_is_in_the_mind/

bonoboTP · on Feb 2, 2016

There's no need to refer people to lesswrong for this. The interpretation of probability as a subjective degree of knowledge/evidence is well-known. For example: http://plato.stanford.edu/entries/probability-interpret/ or http://plato.stanford.edu/entries/epistemology-bayesian/

No need to spam links to that cult site.

jibalt · on Feb 3, 2016

"It seems weird to go with the assumption that because something is extremely unlikely that it won't happen."

No, it's rational.

As for the rest of your post, you are confused between a priori and a posteriori probability.

jibalt · on Feb 3, 2016

"Or maybe it's that somehow the probability changes when we have more information"

Um, yes, that's all that probability is. See Bayes Theorem.

justinpombrio · on Feb 2, 2016

> Why is it that for any given person the probability is 2^-93 but for Roy it's somehow different, even though he is a "given person"?

Because it isn't true that the probability of any given person being struck by lightning is 1/10000. For instance, the internet says that men are around 4 times as likely to be struck by lightning as women. Rather, what's true is that the likelihood of a randomly selected person being struck by lightning is 1/10000. Each individual person has a different likelihood to be struck. I don't know how to calculate it for Roy, but given that he's been struck 7 times I'm sure it's way higher than 1/10000.

> That also makes me think - is there even such a thing as the "true" or inherent probability of an event happening?

I love when people email me after reaching an enlightenment.

There was a probability question that drove me nuts until I figured out what was going on. The question is "A woman has two children. One of her children is a boy. What is the probability that she has two boys?".

The question isn't well defined, because it matters how you learned that one of her children is a boy.

If you asked her "what is the gender of your oldest child?", and she says it's a boy, then the probability that she has two boys is 1/2.

But if you asked her "do you have at least one boy?", and she says yes, then the probability that she has two boys is 1/3.

I don't know what the lesson here is. Probability is hard? Always ask about the experiment?

> is there even such a thing as the "true" or inherent probability of an event happening?

The probability of something depends on what you know. What's the chance that Roy will be struck by lightning tomorrow? If you don't know who Roy is, you might say 1/(10000 * 28000) (where 28000 days is the average human lifespan.). If you do know Roy's history, you'll probably bump that estimate up quite a bit. But if you look up the weather in his park tomorrow and see that it will be sunny all day, the probability will go back down close to zero.

Some things are so hard to know they might as well have an inherent probability, though. If you roll a die, no one's going to predict the outcome of the roll, so we might as well say it's inherently got a 1/6 chance of rolling a 5. And if you shine a photon at a half-silvered mirror, it's actually impossible to predict which way it will go, so I guess that really does have in inherent probability.

You've got to watch out for your probability estimates being wrong, though. I thought the probability of my friend flipping a coin and getting heads was 1/2, until he demonstrated that he could flip heads 10 times in a row.

> edit: Or maybe it's the law of large numbers - given enough "trials" or in this case lightning events with people around, even something with an absurdly small probability is bound to happen eventually. But then why do we never factor that in and always just call it a day with 10000^7?

Because even with 10 billion people, each living a billion days, and having a billion things happen to them in a day... well coincidentally that's exactly 10^28=10000^7, so never mind.

ikeboy · on Feb 1, 2016

https://blogs.msdn.microsoft.com/oldnewthing/20160114-00/?p=...

jibalt · on Feb 3, 2016

Your gut is bad at statistics.

marvin · on Feb 1, 2016

The likelihood of 1/2^80 is on the same order as me picking out a thousand grains of sand from the Sahara desert, spreading them randomly out through the desert and having you pick out those exact thousand grains of sand.

It's a practical impossibility, a philosophical exercise.

xyzzyz · on Feb 1, 2016

No, it's not the same order. 1/2^80 is the same order as the likelihood of as picking 1000 grains out of 1010, and I'm sure you'll agree that Sahara has more grains that 1010.

However, if you picked a single grain from Sahara desert, you'll be only 2 or 3 orders of magnitude off, so one could say that 2^-80 is only slightly easier than finding a particular grain in a Sahara desert.

tika · on Feb 3, 2016

Yes, but 1/2^80 is ridiculously large compared to 1/2^1024.

smegel · on Feb 1, 2016

Given that there is so much probability in the world, you would think you would want to be certain about things you can be objectively 100% certain about, especially for one off long term choices that have big implications.

stouset · on Feb 1, 2016

2^-80 is orders and orders of magnitude less likely than the chance of a bit error in RAM when performing the calculation. Or of fucking up the implementation of such a check. Or of a CPU error.

https://blogs.msdn.microsoft.com/oldnewthing/20160114-00/?p=...

smegel · on Feb 2, 2016

So how did it happen in this case?

wtbob · on Feb 1, 2016

> 2^-80 is an incomprehensibly tiny number.

When one wants a 128-bit security margin, 2^-80 is 2^48 times too big.

mrb · on Feb 1, 2016

This is not a valid comparison.

An attacker who can bruteforce, say, 2^80 128-bit keys (approximate limit of the computational power of the largest adversaries) has 1 chance out of 2^48 to break the security.

But an attacker has only 1 chance out of 2^80 that this parameter is a non-prime.

2^80 is much larger than 2^48, therefore it is not a problem.

wtbob · on Feb 2, 2016

It's not about bruteforcing.

If I want a 128-bit security level, I'm willing for a random guess of my key to have a 2^-128 chance of being correct. I'm likewise willing for a prime I generate to have a 2^-128/uses chance of not being prime (where uses is equal to the number of times I'll actually be using it).

I can't be 100% certain that a large number is actually prime. But I can be certain enough. Having a 2^-80 chance of being wrong isn't good enough when I want a 2^-128 security level.

mrb · on Feb 2, 2016

Yes, security margins are all about what can and cannot be brute forced!

You can't compare these 2 scenarios directly, because in practice:

- attackers can take as many guesses as they want against a piece of data protected by a 128-bit key, but

- attackers get only 1 chance that a particular piece of data protected by a supposed prime is not a prime.

cperciva · on Feb 1, 2016

Apples and oranges. When we talk about 128-bit security, we mean that it takes ~2^128 work to break it; not that there's a 2^-128 chance that it is broken.

marshray · on Feb 1, 2016

Most protocols do allow the attacker to choose how many times the defender must win at some game of probability.

If the defender is somehow put in a situation to generate 2^80 primes, then he's in trouble.

cperciva · on Feb 1, 2016

Sure, I'd aim for a higher security level in a "attacker can keep asking for new primes until we screw up" scenario.

wtbob · on Feb 2, 2016

> When we talk about 128-bit security, we mean that it takes ~2^128 work to break it; not that there's a 2^-128 chance that it is broken.

But it also implies that e.g. the attacker has a 2^-128 chance of randomly guessing a key.

marcosdumay · on Feb 1, 2016

Except for a possible bug, 2^-80 is effectively zero, and getting a non-prime from this routine is effectively impossible.

fixermark · on Feb 1, 2016

Would it be reasonable to say that the following scenarios are more probable?

* The specific version of the software that was used to generate the non-prime contained a bug in its implementation of the algorithm

* The specific hardware upon which the software was run was flawed in a way that generated an incorrect output number from the program (i.e. on-disk corruption, RAM corruption, etc., undetected by the checksums in the hardware)

btilly · on Feb 2, 2016

Both of those scenarios are more probable. But neither is anywhere nearly as probable as human error or human malice.

In the case of human error, it could be as simple as a bad cut-and-paste. In that case the number likely has a half-dozen or so factors, some of which should be found relatively easily.

Malice is, of course, impossible to verify.

heavenlyhash · on Feb 1, 2016

It would be fantastic to see this fix accompanied by a patch that adds a test that runs that primality check again.

It's a check that probably doesn't need to run in CI over and over again, but boy would it be nice to have it codified so everyone reading now could reproducibly run the check if we felt the desire.

nothis · on Feb 1, 2016

There's one in a thousand chances and there's 1-2^-80 chances. Wouldn't that be some "electrons in the universe" level coincidence? Believing in that borders on religion.

fivesigma · on Feb 1, 2016

That would be 10^80 which is much, much larger than 2^80.

Unrelated, but 2^80 isn't that astronomic when it comes to computers, considering that the entire Bitcoin network has performed 2^84 hashes.

mfukar · on Feb 2, 2016

> I'm pretty sure that when you generate a prime you're using the Miller–Rabin primality test in which case you only probabilistically choose a prime.

Why would somebody use MR when there are faster sieves already running, as well as additional public lists of known large primes?

Maybe they shouldn't be near primes, let alone picking them.

tika · on Feb 3, 2016

2^1024 is a lot larger than 2^80. The implication is clear.

__mbm__ · on Feb 1, 2016

# Run Miller-Rabin on the prime in the blink of an eye:

from gmpy2 import mpz, is_prime

p_list = [0xCC, 0x17, 0xF2, 0xDC, 0x96, 0xDF, 0x59, 0xA4, 0x46, 0xC5, 0x3E, 0x0E, 0xB8, 0x26, 0x55, 0x0C, 0xE3, 0x88, 0xC1, 0xCE, 0xA7, 0xBC, 0xB3, 0xBF, 0x16, 0x94, 0xD8, 0xA9, 0x45, 0xA2, 0xCE, 0xA9, 0x5B, 0x22, 0x25, 0x5F, 0x92, 0x59, 0x94, 0x1C, 0x22, 0xBF, 0xCB, 0xC8, 0xC8, 0x57, 0xCB, 0xBF, 0xBC, 0x0E, 0xE8, 0x40, 0xF9, 0x87, 0x03, 0xBF, 0x60, 0x9B, 0x08, 0xC6, 0x8E, 0x99, 0xC6, 0x05, 0xFC, 0x00, 0xD6, 0x6D, 0x90, 0xA8, 0xF5, 0xF8, 0xD3, 0x8D, 0x43, 0xC8, 0x8F, 0x7A, 0xBD, 0xBB, 0x28, 0xAC, 0x04, 0x69, 0x4A, 0x0B, 0x86, 0x73, 0x37, 0xF0, 0x6D, 0x4F, 0x04, 0xF6, 0xF5, 0xAF, 0xBF, 0xAB, 0x8E, 0xCE, 0x75, 0x53, 0x4D, 0x7F, 0x7D, 0x17, 0x78, 0x0E, 0x12, 0x46, 0x4A, 0xAF, 0x95, 0x99, 0xEF, 0xBC, 0xA6, 0xC5, 0x41, 0x77, 0x43, 0x7A, 0xB9, 0xEC, 0x8E, 0x07, 0x3C, 0x6D]

p = mpz(0) # Edit: should be zero, not 1

for num in p_list: p = (p << 8) + num

print("Q: Is p a prime?\nA: %s" % (is_prime(p) and 'Yes' or 'No'))

schoen · on Feb 1, 2016

For anyone following along at home, you missed a D at the very end (0x6D). A number whose hexadecimal representation ended in 6 couldn't be prime because it is even.

(final digit is divisible by d ←→ number is divisible by d works for any d that is a divisor of the base the number is written in)

__mbm__ · on Feb 1, 2016

Thanks for the edit. Of course an even number isn't prime ;)

cperciva · on Feb 1, 2016

There is one even prime. ;-)

stavros · on Feb 2, 2016

And one even primer.

sdenton4 · on Feb 2, 2016

"all primes are odd, and two is the oddest of all."

Natsu · on Feb 2, 2016

... that ruined it for all the other even numbers :)

malandrew · on Feb 1, 2016

What's worse, a 512 bit DH prime or a 1024 not-quite-prime?

droithomme · on Feb 2, 2016

> Here is a message he wrote

In that Zhigang explicitly states that "I don't have enough knowledge to implement the merge." Yet the code is accepted without the slightest review whatsoever.

baby · on Feb 1, 2016

Q: How does p not being a prime => backdoor?

A: p not being a prime means two things:

* subgroup confinement attacks (where you send a public key made with a fake generator g) should be able to take place if the code is weak -> this is because there must be low order subgroups.

* the generator g might not be of great order. This can be easily tested if you know how to factor p: the order of the multiplicative group (Zp)* is the euler's totient function on p. If you know the order of the multiplicative group then you have an algorithm to find the order of your generator: you try all the divisors of the group's order, see the smallest one that works.

Unfortunately, if you don't know how to factor p then you can't easily do that.

Another question is: How can they know it's not a prime if they don't know the factorization of p? We have efficient provable tests for that: they tell you if p is prime or not and nothing else.

pygy_ · on Feb 1, 2016

p not being prime means it has not been generated using a standard tool like OpenSSL.

It's therefore likely it's been handcrafted for nefarious purposes, opening the possibility of further shenanigans.

Another possibility would be the use of a buggy generator, or a clueless dev using random bits rather than a large prime.

daveguy · on Feb 1, 2016

Or someone made a mistake copying and pasting.

Edit: wow. that is serendipity. Just accidentally posted twice. Does hitting "submit" twice do that?

IanCal · on Feb 2, 2016

> Edit: wow. that is serendipity. Just accidentally posted twice. Does hitting "submit" twice do that?

I get that a lot. I've not worked out what's causing it.

jcoffland · on Feb 1, 2016

The way this post is written it sounds like you are asking these questions but then you go on to answer them. It's confusing. It took me a second read to realize you are actually not calling the weakness in to question but explaining why it really is a weakness.

baby · on Feb 1, 2016

edited to clarify :) thanks

schoen · on Feb 1, 2016

> We have efficient provable tests for that: they tell you if p is prime or not and nothing else.

I think this is a typo -- the efficient general-form tests are probable rather than provable.

https://en.wikipedia.org/wiki/Primality_test#Probabilistic_t...

openasocket · on Feb 1, 2016

There are exact tests that run in polynomial time, like https://en.wikipedia.org/wiki/AKS_primality_test

schoen · on Feb 1, 2016

I hate to say that a polynomial-time test isn't efficient because many people use that as the very definition of efficient, but my understanding is that AKS is incredibly impractical because the polynomial is ginormous, even though its asymptotic behavior is nice. So if you actually wanted to know if, say, a 1024-bit number was definitely prime, you wouldn't be able to run AKS on it in a "reasonable time" on a real computer.

baby · on Feb 1, 2016

I haven't actually studied to what extent the AKS tests are do-able. I always figured there would be no problem running one for a 1024 bits prime. Found this on SO: http://cs.stackexchange.com/questions/23260/when-is-the-aks-...

Also, to further the discussion on probable vs provable: the probable tests are enough in our case because they tell us _provably_ if an integer is not a prime (that we care), but _probably_ if an integer is a prime (which we don't care here).

taejo · on Feb 2, 2016

This is not conclusive, but the best deterministic primality test (an AKS variant by Pomerance and Lenstra) is 6th power. 1024^6 is quite large.

zitterbewegung · on Feb 1, 2016

Just because there are exact tests doesn't make the AKS primarily test usable in the real world because of constant factors . Using Fermats little theorem for primality testing is fine because the probably of encountering a Carmichael number is very low.

openasocket · on Feb 1, 2016

You're right, AKS is not a very efficient algorithm and randomized tests are generally good enough. But, there are exact tests which do run fast in practice, such as ECPP. Here's some of the largest primes found using ECPP: http://primes.utm.edu/top20/page.php?id=27

Note the largest there is 30,950 digits, which is about 102,813 bits unless I did my math wrong, so I bet it is usable for 1024-bit numbers. Non-exact methods are much much faster of course, but when you really really need exactness it is an option.

schoen · on Feb 1, 2016

That does seem pretty efficient for numbers of this size; thanks for the example!

jcoffland · on Feb 1, 2016

Seems like a Carmichael number could be a good choice for an attacker. This would make your chances of encountering a Carmichael number quite high. You have to consider who's choosing the number. Or do I have it wrong?

pbsd · on Feb 1, 2016

Only the simple Fermat tests are vulnerable to Carmichael numbers. The test which is used in practice, Rabin-Miller, is not vulnerable to them. In fact, if you assume the extended Riemann hypothesis is true, (log n)^2 iterations of Rabin-Miller are sufficient to prove primality.

schoen · on Feb 1, 2016

How efficient might this be compared to other approaches for searching for Riemann hypothesis counterexamples? Is there a one-to-one relationship between direct Riemann counterexamples and composite numbers that pass (log n)² Rabin-Miller tests?

pbsd · on Feb 1, 2016

I am not familiar with all the ways the GRH can be disproven, but Miller-Rabin doesn't strike me as the most efficient, with its running time of O((log n)^4) for each attempt. I am not sufficiently competent to say whether a GRH counterexample also directly results in a MR counterexample.

gchpaco · on Feb 1, 2016

In addition to what pbsd said, Carmichael numbers are incredibly rare. Your odds of stumbling across one by chance are lower than the odds of a cosmic ray flipping the bit of memory that causes you to believe you have stumbled across one, as I recall.

bjornsing · on Feb 1, 2016

From your own reference (https://en.wikipedia.org/wiki/Primality_test#Probabilistic_t...): "the usual randomized primality tests never report a prime number as composite".

So I'd say that's probably what happened here (i.e p was reported composite by a randomized primality test and that would never have happened had it been prime).

schoen · on Feb 1, 2016

Oh sure, I don't mean that there's uncertainty that this number is composite, just that in a formal sense there isn't "proof" that other numbers that passed, say, openssl -checks 100000 are prime. But I wouldn't consider it unsafe to use them for cryptography.

The comment I was replying to referred to tests that "tell you if a number is prime or not", and the probable ones don't exactly always do that. :-)

eutectic · on Feb 1, 2016

They can provably tell you that a number is composite.

Filligree · on Feb 1, 2016

Keep up the tests until you have ten or twenty nines of certainty. I'd call that proof.

baby · on Feb 1, 2016

then you should re-read the definition of a proof.

jibalt · on Feb 3, 2016

It's clear that you haven't read it at all:

http://www.merriam-webster.com/dictionary/proof

The fact of the matter is that the sort of Platonic definition of proof that you fantasize is impossible because it depends on certainty, which itself cannot be proven. The best we can do is fail to find an error in an alleged proof.

Consider the proofreader's paradox: A proofreader can be prepared to swear, for any given page of a 1000 page book, that there are no typos on that page, but no proofreader is so foolish as to claim that there are no typos in the book.

Filligree · on Feb 2, 2016

What more are you hoping for?

Given a mathematical proof, the chance of an error in the proof is vastly larger than 10^-20.

phkahler · on Feb 1, 2016

>> Unfortunately, if you don't know how to factor p then you can't easily do that.

The link says they don't know where p came from. Presumably someone constructed it as a product of primes known only to them. I don't recall the state of the art in factorization, but if 1024 bits can be factored easily that's news to me. So the weakness would only be exploitable to whomever created p.

Why nobody checked the primality of it IDK.

pbsd · on Feb 1, 2016

One of the factors is 3684787 = 271 x 13597, so I suspect this is more accidental than malicious. The generator, 2, does not seem to have pathologically small order, but I didn't check very far.

orthopteroid · on Feb 2, 2016

Coincidentally, 13597 base 10 is 351d hex, which in unicode is defined as 'strong resistance'.

http://www.charbase.com/351d-unicode-cjk-unified-ideograph

21 · on Feb 1, 2016

Wow. 271 is a factor.

Why not try dividing by all 32 bit numbers, to at least filter easy cases. Shouldn't take more than a few seconds.

qwerty1793 · on Feb 2, 2016

The following Sage script shows that p has no prime factors less than 2^32 other than 271 and 13597:

  p_list = [0xCC, 0x17, 0xF2, 0xDC, 0x96, 0xDF, 0x59, 0xA4, 0x46, 0xC5, 0x3E, 0x0E, 0xB8, 0x26, 0x55, 0x0C, 0xE3, 0x88, 0xC1, 0xCE, 0xA7, 0xBC, 0xB3, 0xBF, 0x16, 0x94, 0xD8, 0xA9, 0x45, 0xA2, 0xCE, 0xA9, 0x5B, 0x22, 0x25, 0x5F, 0x92, 0x59, 0x94, 0x1C, 0x22, 0xBF, 0xCB, 0xC8, 0xC8, 0x57, 0xCB, 0xBF, 0xBC, 0x0E, 0xE8, 0x40, 0xF9, 0x87, 0x03, 0xBF, 0x60, 0x9B, 0x08, 0xC6, 0x8E, 0x99, 0xC6, 0x05, 0xFC, 0x00, 0xD6, 0x6D, 0x90, 0xA8, 0xF5, 0xF8, 0xD3, 0x8D, 0x43, 0xC8, 0x8F, 0x7A, 0xBD, 0xBB, 0x28, 0xAC, 0x04, 0x69, 0x4A, 0x0B, 0x86, 0x73, 0x37, 0xF0, 0x6D, 0x4F, 0x04, 0xF6, 0xF5, 0xAF, 0xBF, 0xAB, 0x8E, 0xCE, 0x75, 0x53, 0x4D, 0x7F, 0x7D, 0x17, 0x78, 0x0E, 0x12, 0x46, 0x4A, 0xAF, 0x95, 0x99, 0xEF, 0xBC, 0xA6, 0xC5, 0x41, 0x77, 0x43, 0x7A, 0xB9, 0xEC, 0x8E, 0x07, 0x3C, 0x6D]

  p = 0
  for num in p_list: p = (p << 8) + num

  for i, prime in enumerate(primes(2^32)):
      if i % 100 == 0: print str(prime) + '\r',
      if p % prime == 0: print('\n')

williamstein · on Feb 2, 2016

Sage also has an is_prime function, which provably show that p is not prime: "is_prime(p)". This takes less than 1ms to run. Proving primality of a 1024 bit prime in Sage takes a few seconds, and for a 2048 bit prime about a minute. Sage uses PARI for this, with sophisticated elliptic curve based algorithms called ECPP("=elliptic curve primality proving"), which are non-deterministic (unlike AKS) but fast in practice and provably correct. https://goo.gl/34uxJl

baby · on Feb 3, 2016

Have you tried dividing the order by 271 and 13597 and search again on that?

tika · on Feb 3, 2016

271 is rather small. Which means this fake prime was quite easy to generate.

jibalt · on Feb 3, 2016

"Presumably someone constructed it as a product of primes known only to them"

It would be foolish to presume that. And knowing the factors of the Diffie-Hellman parameter isn't all that important.

sdevlin · on Feb 2, 2016

Some of these issues are orthogonal.

You're correct that the ring of integers mod n with n composite will have small multiplicative subgroups. But so will the integers mod p with p prime. At the very least, 1 and p-1 will always have orders 1 and 2, respectively. I could be wrong, but I don't think the primality or compositeness of the modulus alone tells you much about the smoothness of the group order.

Small subgroups may not always lead to key recovery, but they can lead to key dictation in certain protocols. So subgroup-confinement attacks are always a consideration in this setting.

If you want to learn more about subgroup-confinement attacks on DH and ECDH, check out set 8 of cryptopals. Mail set8.cryptopals@gmail.com with subject "Crazy Flamboyant for the Rap Enjoyment".

openasocket · on Feb 1, 2016

But once you have the factorization for p, since it's hardcoded, it's now much easier to break every DH key exchange used by this application. Getting that factorization would be very very difficult, but once you have it you can use it on everyone.

baby · on Feb 1, 2016

if it's not an easy factorization => it will be hard. According to recent results we believe state-sized adversary should be able to do it. If you're threat model is against criminals, then you might be OK.

EDIT: if 1024bits factorization is easy in general, you can say goodbye to every 1024bits RSA modulus. My first statement doesn't mean it can never be easy, it means that if you try and factor it the easy way and it doesn't work... you are in for a lot of work and research.

dzdt · on Feb 1, 2016

Well, it isn't known who provided the number in the first place. Whoever that unknown party is presumably knows the factorization they used to construct p. This could be a state, or could be a criminal enterprise. Better not to trust it!

baby · on Feb 1, 2016

Oh right. I guess I forgot this part 8)

xnyhps · on Feb 1, 2016

Testing for 2 seconds already found the prime factors 271 and 13597. It will probably not be hard.

Edit: To add some more information, p / (271 * 13597) is still not prime, however, the library I was using didn't find any new factors in 30 minutes.

qwerty1793 · on Feb 2, 2016

There are no other factors less than 2^32. You can see my other comment for details on verifying this.

swalsh · on Feb 1, 2016

I always wonder if when things like this get found, there's someone in the NSA going "Wow they finally found it, only took them x years"

phkahler · on Feb 1, 2016

>> I always wonder if when things like this get found, there's someone in the NSA going "Wow they finally found it, only took them x years"

I would guess they were aware of the problem weather they created it or not. So yes.

duaneb · on Feb 1, 2016

> I would guess they were aware of the problem weather they created it or not.

While it's smart to assume so, it's also pretty laughable to think that the NSA has an exhaustive list of encryption vulnerabilities.

nisa · on Feb 1, 2016

> it's also pretty laughable to think that the NSA has an exhaustive list of encryption vulnerabilities

Why not? They have enough mathematicians and cryptographers on payroll that they can analyse the major protocols and software that is used. If you look at the recent attacks on TLS - especially the downgrade attacks to export grade ciphers they would be stupid to not exploit this for targeted attacks.

That being said I doubt they'll have very good mathematical attacks as a lot of researchers look at this stuff but implementation bugs, side channel attacks and plain bugs are plenty.

most of the world runs openssl or CryptoAPI - why not spend a few man years looking through these implementations?

Well at least if I where them I'd have some programmers to come up with automatic bug finding techniques and fuzz the hell out of existing applications.

They also likely don't care about your encrypted facebook chat but government infrastructure, mobile telecommunication providers and lots of important enterprises are probably pretty often on their "we need to get in list" if you can decrypt captured encrypted traffic - I imagine this would be a huge net win.

duaneb · on Feb 1, 2016

Because coming up with an exhaustive list of all the encryption software in the world would be difficult, let alone enumerating all of their vulnerabilities.

wolfgke · on Feb 1, 2016

For practical purposes it should suffice if they create an exhaustive list of the encryption software that is used, say, in 99% of all cases. Finding those and documenting some vulnerabilities of each is quite a realistic goal.

duaneb · on Feb 1, 2016

Sure. This is also not exhaustive, by definition.

Coincoin · on Feb 1, 2016

It's very easy to detect those kind of fuck ups, you just have to look for them. We don't have the mean to do it, and sometime we are just ignorant or lazy. They are neither.

We assume actual human beings need to press buttons to detect obvious developer errors. I bet if an encrypted communication with any kind of bad (or even just popular default) parameter goes through anything the NSA oversees it gets instantly attacked and put in a bin somewhere with your new software name on it. The machine probably even picks a random name for the exploit once it found one.

jldugger · on Feb 1, 2016

The NSA is old enough, and well resourced enough. Time and money solve these problems readily enough. In the case of FOSS, all you really need is a parser designed to hunt down specific sorts of flaws. We have static analysis for non-crypto needs, it seems reasonable someone at NSA got funded to write one for their use case.

pjc50 · on Feb 2, 2016

If they have a complete IP traffic recording facility, they will be applying automatic classification to it and looking to make the classification as complete as possible. Anything that doesn't fit the existing categories will attract attention.

tankenmate · on Feb 1, 2016

> While it's smart to assume so, it's also pretty laughable to think that the NSA has an exhaustive list of encryption vulnerabilities.

True, but in this case I'm sure they have enough hardware to factor any widely deployed primes used in crypto or semi crypto comms software. After all, that is half of the NSAs job description.

Lozzer · on Feb 2, 2016

I expect you have enough hardware to factor widely deployed primes. Compound numbers might be a different story.

duaneb · on Feb 1, 2016

Definitely agree with this.

gcommer · on Feb 1, 2016

FWIW, this hardcoded "prime" was only in use since 2015 (though before that, it was 512 bits)

oxguy3 · on Feb 1, 2016

Title is misleading -- this appears to be an issue with a tool called socat, not with OpenSSL. That's a world of a difference in the severity of the issue.

mrb · on Feb 1, 2016

Thanks, I fixed the title. (For the record it was "OpenSSL: the hard coded 1024 bit DH p parameter was not prime").

yeukhon · on Feb 1, 2016

I think the first line is confusing and misleading until I read yours.

> In the OpenSSL address implementation the hard coded 1024 bit DH p parameter was not prime.

It should have been worded "In Socat, the DH p parameter used by OpenSSL implementation was hardcoded and was not a prime."

LukeShu · on Feb 1, 2016

The original phrasing is correct, but confusing if you aren't familiar with socat. Socat fundamentally works by giving it a pair of addresses; they are referring to socat's implementation of addresses starting with "OPENSSL:" (caps-insensitive), AKA "OpenSSL addresses".

yeukhon · on Feb 1, 2016

I see. You are right, and yes I may have jumped to the gun too quickly.

agwa · on Feb 1, 2016

This is a vulnerability in socat's TLS support. It has nothing to do with OpenSSL (besides the fact that OpenSSL provided a footgun API by leaving it to application developers to supply DH parameters).

ChuckMcM · on Feb 1, 2016

"Footgun API" - I like that!

Its an interesting challenge though, I wonder if the person who picked the constant the first time understood the ramifications of it being prime or not. And if they did, how hard they worked to validate its primality.

jaxb · on Feb 2, 2016

btw, they recommend checking custom params via DH_check() (https://wiki.openssl.org/index.php/Diffie-Hellman_parameters...) but, appparently, neither apache nor nginx do this...

madars · on Feb 2, 2016

Very nice asymmetric backdoor!

If you happen to know the factorization and the factors are not too large (e.g. two 500-bit factors + some chaff), then you can just use Pohlig-Hellman algorithm to solve the DLP modulo each individual factor, combine the results and recover the shared Diffie-Hellman secret.

But without this trapdoor information (and, say, if p was chosen to be a Blum integer), computing the Diffie-Hellman shared secret is as hard as factoring that modulus (see https://crypto.stanford.edu/~dabo/abstracts/DHfact.html).

mabbo · on Feb 1, 2016

Someone should really write a set of unit tests available in every conceivable language, marked as "Please copy this unit test into your test base, and use it to verify all your primes are prime".

You can even make it a bit fuzzy- Miller-Rabin uses random numbers, right? So make it that every time the unit test is run it generates new random values. Your test won't be deterministic, but it will fail at least some of the time which should be enough to raise an alarm of a problem.

digler999 · on Feb 1, 2016

I'm aware that factoring a prime into composites is one of the most difficult computational problems, but isn't it cheap to determine if a number is prime ? a^(p-1) == a mod p (where == means "congruent to" ). with modular exponentiation isn't it simple to compute the modulus ?

cyphar · on Feb 2, 2016

The little Fermat test requires you to test all "a"s in order to remove false positives. For most numbers, testing a few thousand bases is enough but you have Carmichael numbers that break the test.

There are better primality checks, but they all have downsides (either slow but provable or fast but probabilistic). Finding prime numbers can be shown to be easier than finding factors (see: how GIMPS checks for primality), but that doesn't make it easy.

mabbo · on Feb 2, 2016

isPrime(p) is polynomial relative to the number of digits of p, but not in any useful way- takes way too long practically speaking. But that's only if you want a definite solution.

If you're willing to do a probabilistic test, you can use algorithms like Miller-Rabin- the more time you use it with random inputs the more sure you are that the number is prime. And it's relatively easy to implement- check Wikipedia for the pseudocode.

wanderfowl · on Feb 1, 2016

Nice catch. Now, the real question is who committed it, and how they came up with the number.

If it is a backdoor, it's pretty smart, because it's very deniable as a "stupid mistake". And if it's a stupid mistake, it's extra stupid, because this committer will have trouble convincing the world that that's all it was. At the very least, somebody needs to be going through all this person's commits with a fine-toothed comb.

The methods of handling situations like this in the face of a known threat is going to be interesting. You hate to ban or hinder a good programmer from a project, but once possibly-bitten, twice shy.

jcoffland · on Feb 1, 2016

Aren't large non-primes usually created by multiplying two large but smaller primes together. Factoring is then the challenge. Or is there more to this that I'm missing?

wanderfowl · on Feb 1, 2016

The question is whether they:

a) knew it was non-prime, and used it to weaken the crypto

b) knew it was non-prime, and used it because they didn't think it needed to be prime (which is a massive sin of ignorance)

c) grabbed 1024 bits of rand() and didn't check if it was prime (again, stupid)

d) grabbed some rand and checked the prime-ness using a bad method

e) used a "prime number generator" that produced bad output

I agree that making non-prime numbers is not terribly difficult, but the question of how they got the number is only interesting in that it gives info about why.

caf · on Feb 2, 2016

f) Used a machine with bad RAM that flipped a bit.

This case could actually be tested for - see if any of the one-bit differences from the number used are prime.

mortenlarsen · on Feb 2, 2016

A bit could have been flipped in the software or the result of the function (true/false), but in the number itself there appear to be no single bit flips that make it prime (at least in the binary representation).

Edit: A single bit flip could have been used as "semi" plausible deniability in the case of malicious intent.

wanderfowl · on Feb 1, 2016

Then again, this is wholly academic. It's not like, if it's A, the committer is going to say "yup im NSA u caught me lol."

sova · on Feb 1, 2016

it is the same clear path as unix vs closed source research. perhaps there will be some achievements privately, but all mind and matter is interco-mingled already and forever. secrets are just another way of saying "obscure path" ... feels?

natch · on Feb 1, 2016

>there is no indication of how these parameters were chosen

Is there really no protocol used in projects undertaken by the security community that would ensure that each component of the tools we rely on has a known history?

gcommer · on Feb 1, 2016

Well, socat has a git repo at git://repo.or.cz/socat.git so you can see the history of the prime number. In particular, it was upgraded from a 512 bit prime to a 1024 bit "prime" in commit 281d1bd on Jan 23, 2015.

Neither the code then, or the new code in this patch, have comments indicating how the prime was generated. (It is only mentioned in the advisory that openssl dhparams was used for the recent patch)

foota · on Feb 2, 2016

Might be an interesting exercise to search through github for large numeric constants and find ones that aren't prime, then manually check through those for ones that are supposed to be.

bryogenic · on Feb 1, 2016

This is the reason RFC 5114 exists.

http://tools.ietf.org/html/rfc5114

kachnuv_ocasek · on Feb 1, 2016

Has anyone checked the new number is a prime?

schoen · on Feb 1, 2016

Some community-vetted larger DH parameters that software developers can use:

https://datatracker.ietf.org/doc/draft-ietf-tls-negotiated-f...

archgoon · on Feb 1, 2016

Previously:

    915       static unsigned char dh1024_p[] = {                                                           
    916      0xCC,0x17,0xF2,0xDC,0x96,0xDF,0x59,0xA4,0x46,0xC5,0x3E,0x0E,
    917      0xB8,0x26,0x55,0x0C,0xE3,0x88,0xC1,0xCE,0xA7,0xBC,0xB3,0xBF,
    918      0x16,0x94,0xD8,0xA9,0x45,0xA2,0xCE,0xA9,0x5B,0x22,0x25,0x5F,
    919      0x92,0x59,0x94,0x1C,0x22,0xBF,0xCB,0xC8,0xC8,0x57,0xCB,0xBF,
    920      0xBC,0x0E,0xE8,0x40,0xF9,0x87,0x03,0xBF,0x60,0x9B,0x08,0xC6,
    921      0x8E,0x99,0xC6,0x05,0xFC,0x00,0xD6,0x6D,0x90,0xA8,0xF5,0xF8,
    922      0xD3,0x8D,0x43,0xC8,0x8F,0x7A,0xBD,0xBB,0x28,0xAC,0x04,0x69,
    923      0x4A,0x0B,0x86,0x73,0x37,0xF0,0x6D,0x4F,0x04,0xF6,0xF5,0xAF,
    924      0xBF,0xAB,0x8E,0xCE,0x75,0x53,0x4D,0x7F,0x7D,0x17,0x78,0x0E,
    925      0x12,0x46,0x4A,0xAF,0x95,0x99,0xEF,0xBC,0xA6,0xC5,0x41,0x77, 
    926      0x43,0x7A,0xB9,0xEC,0x8E,0x07,0x3C,0x6D,
    927       };

  $ echo 'isprime(143319364394905942617148968085785991039146683740268996579566827015580969124702493833109074343879894586653465192222251909074832038151585448034731101690454685781999248641772509287801359980318348021809541131200479989220793925941518568143721972993251823166164933334796625008174851430377966394594186901123322297453)' | gp -q
  0

With fix

    xio-openssl.c
    915       static unsigned char dh2048_p[] = {
    916      0x00,0xdc,0x21,0x64,0x56,0xbd,0x9c,0xb2,0xac,0xbe,0xc9,0x98,0xef,0x95,0x3e,
    917      0x26,0xfa,0xb5,0x57,0xbc,0xd9,0xe6,0x75,0xc0,0x43,0xa2,0x1c,0x7a,0x85,0xdf,
    918      0x34,0xab,0x57,0xa8,0xf6,0xbc,0xf6,0x84,0x7d,0x05,0x69,0x04,0x83,0x4c,0xd5,
    919      0x56,0xd3,0x85,0x09,0x0a,0x08,0xff,0xb5,0x37,0xa1,0xa3,0x8a,0x37,0x04,0x46,
    920      0xd2,0x93,0x31,0x96,0xf4,0xe4,0x0d,0x9f,0xbd,0x3e,0x7f,0x9e,0x4d,0xaf,0x08,
    921      0xe2,0xe8,0x03,0x94,0x73,0xc4,0xdc,0x06,0x87,0xbb,0x6d,0xae,0x66,0x2d,0x18,
    922      0x1f,0xd8,0x47,0x06,0x5c,0xcf,0x8a,0xb5,0x00,0x51,0x57,0x9b,0xea,0x1e,0xd8,
    923      0xdb,0x8e,0x3c,0x1f,0xd3,0x2f,0xba,0x1f,0x5f,0x3d,0x15,0xc1,0x3b,0x2c,0x82,
    924      0x42,0xc8,0x8c,0x87,0x79,0x5b,0x38,0x86,0x3a,0xeb,0xfd,0x81,0xa9,0xba,0xf7,
    925      0x26,0x5b,0x93,0xc5,0x3e,0x03,0x30,0x4b,0x00,0x5c,0xb6,0x23,0x3e,0xea,0x94,
    926      0xc3,0xb4,0x71,0xc7,0x6e,0x64,0x3b,0xf8,0x92,0x65,0xad,0x60,0x6c,0xd4,0x7b,
    927      0xa9,0x67,0x26,0x04,0xa8,0x0a,0xb2,0x06,0xeb,0xe0,0x7d,0x90,0xdd,0xdd,0xf5,
    928      0xcf,0xb4,0x11,0x7c,0xab,0xc1,0xa3,0x84,0xbe,0x27,0x77,0xc7,0xde,0x20,0x57,
    929      0x66,0x47,0xa7,0x35,0xfe,0x0d,0x6a,0x1c,0x52,0xb8,0x58,0xbf,0x26,0x33,0x81,
    930      0x5e,0xb7,0xa9,0xc0,0xee,0x58,0x11,0x74,0x86,0x19,0x08,0x89,0x1c,0x37,0x0d,
    931      0x52,0x47,0x70,0x75,0x8b,0xa8,0x8b,0x30,0x11,0x71,0x36,0x62,0xf0,0x73,0x41,
    932      0xee,0x34,0x9d,0x0a,0x2b,0x67,0x4e,0x6a,0xa3,0xe2,0x99,0x92,0x1b,0xf5,0x32,
    933      0x73,0x63
    934       };

  $ echo 'isprime(27788893276069724796504555675597658900595616769773727063231875314156885361379100133264804184710789407128574011804155595735704837674243828066040543912171576627544718762752948158991754559261759162739343094515270757451837630913502740443023902769553802723685440839891240497710460941757089246131322686180648463540974702859210630184042730717698427486397505787974799692901205514386555272667298045803284972074823213104807295638814082142694729938965663710648170010420323923305528998108799706139846097432481556448740855888110797022123731105964852194684036975049177742094726795060211226322344210328442014189175085444396370522979)' | gp -q
  1

The original error could have been checked with a quick code review verifying that the provided number was in fact prime. This should have been done when the original patch was submitted.

542458 · on Feb 1, 2016

Oh jeez. That non-prime is evenly divisible by 271 and 13597 (among other things).

gus_massa · on Feb 1, 2016

Let's assume it's not malice. Can it be a typo? Is there a prime with a low Levenshtein distance in hex of the old no-prime number?

(Is there a always a prime within a low distance from any number?)

lifthrasiir · on Feb 2, 2016

I've tested some of them with PARI/GP. There is no probable prime with the Hamming distance 1. There are several (probably 3--400, haven't exhaustively listed) probable primes with the Hamming distance 2 (p ^ (1<<30) ^ (1<<14) is one example).

bonzini · on Feb 2, 2016

About 1 in 700 1024-bit numbers is prime. There are 1023512 Hamming-distance 2 numbers for a 1024-bit number, about 750 of them will be prime.

If you found no Hamming distance 1 primes (expected: 0.68) and 3-400 Hamming distance 2 primes (expected: 750), this is one really* bad "prime"...

kazagistar · on Feb 2, 2016

Well, the provided command takes 2.5 min to for pari to test the given "prime" on my computer. There are 16*256 numbers with a single hex digit mutation from the number given.

Anyone got a cluster laying around?

mbauman · on Feb 1, 2016

The change log message that introduced the new constant is interesting, too. "Socat did not work in FIPS mode because 1024 instead of 512 bit DH prime is required. Thanks to XXX for reporting and sending a patch."

I wonder what prompted the review a year later.

rlpb · on Feb 1, 2016

Not just a quick code review; it should be a build time assertion.

codeulike · on Feb 1, 2016

I dont think that testing whether huge numbers are prime or not is quite as easy as you assume.

https://en.wikipedia.org/wiki/Primality_test

alextgordon · on Feb 1, 2016

It's pretty easy. There are ways to test primarily such that constructing a counterexample would be an important mathematical result.

https://en.wikipedia.org/wiki/Baillie–PSW_primality_test

> The power of the Baillie-PSW test comes from the fact that these lists of strong Fermat pseudoprimes and strong Lucas pseudoprimes have no known overlap. There is even evidence that the numbers in these lists tend to be different kinds of numbers.

CamperBob2 · on Feb 1, 2016

It might be "easy" but is it quick enough to be tolerated as part of a build?

anderskaseorg · on Feb 1, 2016

A primality certificate could be included that allows the prime to be verified quickly during the build. This certificate for the 2048-bit prime took 90 seconds to generate, and can be verified in 3 seconds.

http://web.mit.edu/andersk/Public/socat-prime.pl

https://en.wikipedia.org/wiki/Primality_certificate

lorenzhs · on Feb 1, 2016

The primality test of the original number using gp that archgoon posted takes 6ms on my Sandy Bridge laptop, that's plenty fast. (The new number takes a minute to check, though, so that might really be too long)

sdhsdh · on Feb 1, 2016

Maxima's primep takes 360ms on the 1024-bit one for me, so seems quick enough to stick into the build.

It sounds like the chance if it passing the test is around 10^-15 using the default values; which seems sufficient for a build test which is run many times.

jldugger · on Feb 1, 2016

If this the alternative, it very well might be.

leni536 · on Feb 2, 2016

Just build your sensitive hard coded primes in a separate object file. It rarely needs updating.

CamperBob2 · on Feb 1, 2016

Kind of interesting that the first nonprime version ended in a comma, as if it originated as a fragment of a longer array of bytes. That's nonstandard C, and I'd expect that whatever tool(s) generate those arrays would know not to add a comma after the final entry.

bigiain · on Feb 1, 2016

<cynical thought>That's a nice piece of plausible deniability, huh? I wonder if the committer has a few extra bytes that he can claim were supposed to be there which make a number that passes all the primality tests? That'd be a nice excuse for the NSA/3PLA overlords to have given him...

cperciva · on Feb 1, 2016

Extra bytes would have made it not a 1024-bit integer.

gertef · on Feb 2, 2016

Whicgh provides an excuse for why the prime was truncated, and OOPS made it into a not-prime.

cperciva · on Feb 1, 2016

The optional comma at the end of the list of entries in an array is explicitly allowed by the C standard.

CamperBob2 · on Feb 1, 2016

(Actually I may be confusing the extra comma here with an extra comma at the end of an enum list. Some compilers are OK with that, while others aren't.)

1amzave · on Feb 2, 2016

If I'm reading N1256 right (not sure how far back it actually dates, but that's a C standard draft dated 2007), section 6.7.2.3 also explicitly allows a trailing comma in that context as well, so if such a compiler exists its makers should get with the program, so to speak.

xnyhps · on Feb 2, 2016

It's also there for the previous 512 bit prime. Probably just the style preference of the project.

Edit: Actually, it looks a lot like the code was originally generated using openssl dhparam -2 -C: https://gist.github.com/xnyhps/df16e7d43e32b7dbe3fb

jasonjayr · on Feb 1, 2016

That looks like it's for 'socat', not openssl itself ...

chris_wot · on Feb 1, 2016

I'm curious, is there a list of known primes held somewhere?

schoen · on Feb 1, 2016

There are plenty; see, for example

https://en.wikipedia.org/wiki/List_of_prime_numbers#External...

But you couldn't hope to list all primes up to 1024 bits.

https://en.wikipedia.org/wiki/Prime-counting_function

https://en.wikipedia.org/wiki/Prime_number_theorem

For example, there are about 2⁸⁰ primes less than 10²⁶, that is, with 26 or fewer digits. (Where could you get 2⁸⁰ bits of storage, even if you could do the huge computation necessary to check each of these?) 1024-bit primes are about 308 digits long.

You can imagine trying to get an intuition for a few different kinds of things:

* Up to what point has the primality of every number been checked?

* What is the largest general-form number whose primality has been checked conclusively? (or, what is the largest-known general-form prime?)

* What's the largest general-form semiprime that has been factored into primes without foreknowledge of the factors?

* How big are the primes we use in cryptography?

* How big are the largest-known (special-form) primes?

These things are very different orders of magnitude. I'm not sure of the exact answers to the first two, but I expect the first is below 20 digits (the π(n) calculations apparently didn't check all the individual numbers). The answers to the last 3 are: 232 digits, 308 to 1233 digits, and 22338618 digits.

nicolas314 · on Feb 1, 2016

Pi would be a good source, but they are in the wrong order.

cyphar · on Feb 2, 2016

Pi would be a horrible source. Why would you want to use a deterministic digit generation function to generate your entropy. Even if you always used very large digit offsets. I can't imagine it being a remotely good idea.

Filligree · on Feb 2, 2016

As a source of primes? OP never said he wanted them to be random, and pi does contain every (finite) number. :P

im2w1l · on Feb 2, 2016

That is not know to be the case, only conjectured.

Natsu · on Feb 2, 2016

I thought it was proven to be trancendental? Or were you saying that this property of trancendental numbers is only conjectured?

tzs · on Feb 2, 2016

Pi is transcendental, which means that it is not a root of a non-zero polynomial with rational coefficients.

Being transcendental does not imply that a number's expansion in a given base must include every digit string. Consider the number 1/10^1! + 1/10^2! + 1/10^3! + 1/10^4! + ....

This number, whose decimal expansion is 0.110001000000000000000001... is transcendental (proven by Liouville in 1844). Its decimal expansion clearly does not contain every decimal number. It only contains the digits 0 and 1, and after the first two places never even contains consecutive 1s.

It is known that "almost all" real numbers do in fact contain in their base b expansion every sequence of base b numbers, each sequence occurring with frequency proportional to its length. These are called "normal" numbers. Very few interesting numbers (where "interesting" means that we have some reason to be interested in aside from their normality) are known to be normal, though.

cyphar · on Feb 2, 2016

More correctly, all (AFAIK) currently provable normal numbers have been designed to be normal.

chris_wot · on Feb 2, 2016

Sorry to go off topic here, but can you give an example of a number that isn't interesting?

tzs · on Feb 2, 2016

By "interesting" I mean a number that arises out of something else that one might be interested in.

For instance, consider pi. If you are interested in geometry, pi will turn up. If you are interesting in number theory, pi will turn up (e.g., it is connected to zeta functions). If you are interested in probability and statistics, pi will turn up. If you are interested in differential equations, pi will come to the party.

If you somehow have never encountered pi, I can convey it to you by telling you about one of those things. For instance, I could tell you that it is the period of the non-zero solutions of the differential equation y'' + y = 0.

An uninteresting number would be one that has no known connection to other things. If I have a particular uninteresting number, and I want to convey it to you, I'll have to just tell you the number.

A random number would almost certainly be uninteresting, such as this hex fraction, which came from /dev/urandom on my computer: 0.bfdab557104bf2d8952fb1ea0adfd732794a353d5b35d95cda927f4ad8f6dd11f11b2e968298. It is extremely unlikely that anyone has ever seen that number before. The only known thing interesting about it is that it was made specifically as an example of a number that is not otherwise interesting.

jibalt · on Feb 3, 2016

Generate some random numbers. They aren't interesting.

cyphar · on Feb 2, 2016

Pi is transcendental, but that has nothing to do with "containing every number". A transcendental number is defined as a number which is not the root of any non-zero polynomial with rational coefficients. The two properties are not related. For example, the first example of a transcendental number (Louisville numbers) isn't capable of being base-10 normal (it only contains the digits 0 and 1).

The property you're referring to is related to normality. A normal number in a base b is a number where the frequency of digits in that base approaches 1/b, but is not a rational number (and thus does not have cycles). Pi has not been proven to be normal, but if it were then it would have the property of which you speak (which is an informal property provided by normality).

tika · on Feb 3, 2016

droithomme · on Feb 2, 2016

It's interesting that all the time no one noticed it was not actually prime. This leans skepticism towards assumptions that widely used security critical open source code is reviewed by anyone competent at evaluating it, even over long periods of time.

Mojah · on Feb 1, 2016

Mirror here: http://marc.ttias.be/oss-security/2016-02/msg00003.php

cyphar · on Feb 2, 2016

I think most people are glossing over the first part of the title. Why is the DH p parameter hardcoded? Why not just generate one on each startup?

Ded7xSEoPKYNsDd · on Feb 2, 2016

Generating those parameters is very slow. Just run `time openssl dhparam -text -noout 1024` a few times and see for yourself.

cyphar · on Feb 2, 2016

Then do it on first run and store it in ~/.socat_prime and do a primality check each time it's loaded.

yyin · on Feb 2, 2016

The older version I was using, 1.7.2.4, is not affected. So much for "updates".