
Academics Make Theoretical Breakthrough in Random Number Generation - oolong_decaf
https://threatpost.com/academics-make-theoretical-breakthrough-in-random-number-generation/118150/
======
tptacek
I'm sure this is as important to computer science as the article claims, but
not having even read the paper I can say pretty confidently that it isn't
going to have much of an impact on computer security. Even if it became far
easier to generate true random numbers, it wouldn't change (a) how we generate
randomness at a systems level or (b) what goes wrong with randomness.

Our problem with cryptography is _not the quality of random numbers_. We are
fine at generating unpredictable, decorrelated bits for keys, nonces, and IVs.
Soundly designed systems aren't attacked through the quality of their entropy
inputs†.

The problem we have with randomness and entropy is logistical. So long as our
CSPRNGs need initial, secret entropy sources of any kind, there will be a
distinction between the insecure state of the system before it is initialized
and the (permanent) secure state of the system after it's been initialized.
And so long as we continue building software on general purpose operating
systems, there will be events (forking, unsuspending, unpickling, resuming
VMs, cloning VMs) that violate our assumptions about which state we're in.

Secure randomness isn't a computational or cryptographic problem (or at least,
the cryptographic part of the problem has long been thoroughly solved). It's a
systems programming problem. It's back in the un-fun realm of "all software
has bugs and all bugs are potential security problems".

It's for that reason that the big problem in cryptography right now isn't
"generate better random", but instead "factor out as much as possible our
dependence on randomness". Deterministic DSA and EdDSA are examples of this
trend, as are SIV and Nonce-Misuse Resistant AEADs.

† _(unsound systems frequently are, but that just makes my point for me)_

------
hannob
While this may be an interesting theoretical result it almost certainly has
zero practical implications for cryptography.

We already know how to build secure random number generators. Pretty much
every real world problem with random numbers can be traced back to people not
using secure random numbers (or not using random numbers at all due to bugs)
or using random number generators before they were properly initialized (early
boot time entropy problems).

This random number thing is so clouded in mystery and a lot of stuff gets
proposed that solves nothing (like quantum RNGs) and stuff that's more
folklore than anything else (depleting entropy and the whole /dev/random
story). In the end it's quite simple: You can build a secure RNG out of any
secure hash or symmetric cipher. Once you seeded it with a couple of random
bytes it's secure forever.

~~~
erikb
I'm far from being an expert, but I doubt that.

"it's quite simple" \- Are we talking security here? Then that phrase can't
apply.

"You can build a secure <something> out of _any_ <something else>." \- Is that
how security works?

"it's secure _forever_." \- Gung'f jung Pnrfne fnvq. (This sentence is
"encrypted" in case that's not clear)

~~~
dkopi
Prnfre pvcure vf nyfb xabja nf EBG13

~~~
erikb
Yeah, one of them. I think the number is flexible for good old Caesar.

------
oolong_decaf
Here's a link to the actual paper: [http://eccc.hpi-
web.de/report/2015/119/](http://eccc.hpi-web.de/report/2015/119/)

------
electrograv
> _We show that if you have two low-quality random sources—lower quality
> sources are much easier to come by—two sources that are independent and have
> no correlations between them, you can combine them in a way to produce a
> high-quality random number_

"Independent and no correlations" sounds like a crippling assumption if you
want to use any two deterministic PSRNGs. How can you possibly guarantee
they're completely un-correlated and independent without seeding them with
collectively more bits of entropy than you can get out of the combined system?

I'm not sure what "independent" is even supposed to mean for a deterministic
sequence, which by definition is recursively dependent.

~~~
njohnson41
That's because this result is not about combining weak deterministic PRNGs,
it's about combining entropy sources (like two hardware random number
generators).

This has always been possible, but it sounds like they've lowered the minimum
entropy needed in the source streams to produce a high-quality output.

~~~
electrograv
Thanks for the explanation, that makes sense. I think this quote threw me off:

> _The academics’ latest work hurdles those restrictions allowing the use of
> sequences that are only weakly random_

What does "weakly random" mean, if not a PRNG? Just low pure entropy per bit
of sequence data? What's the threshold then between strong random and weak
random -- wouldn't it be a continuum of entropy?

Minor nitpick: Also, how can a deterministic PRNG have less entropy (0) than
that of its seed?

~~~
njohnson41
Right, the amount of entropy per bit of sequence is always between 0
(deterministic) and 1 (every bit is independent and 50/50) (... or between 0
and log2(k) in general if the element varies over a set of k things). These
"weak" sources just have low entropy per bit. They could be biased (more 0s
than 1s) or correlated (long runs of 0s/1s or periodicity), or just have some
other pattern that sometimes holds.

A deterministic PRNG's _sequence_ has exactly the entropy of it's seed,
actually, but it has 0 bits of entropy per symbol, because its sequence is
infinite.

The thing most people get confused about with entropy is in thinking that
entropy is a property of some single object, like a bit string. Really,
entropy is always a measurement about a probability distribution, just like
mean or variance is. In the usual case with random streams, the distribution
is P(x_i | x_i-1 ... x_0) for bits x_i in the stream, i.e. the distribution
remaining for the current bit even if we know all previous bits. For a
deterministic PRNG, once we can extract the key from the history (given
unlimited compute power) that distribution becomes deterministic, so the
entropy is 0.

~~~
dietrichepp
The entropy of a single object is a meaningful concept. It is usually called
Kolmogorov complexity.

~~~
njohnson41
Kolmogorov complexity is definitely meaningful, but it's not (Shannon)
entropy, just conceptually similar. Many people think of something like
Kolmogorov-complex sequences when they think of "random" sequences, which is
(IMO) why they have trouble thinking of entropy as being about a probability
distribution.

The one case where they coincide (sort of) is if you believe your random
sequence is generated by a randomly chosen Turing machine, which I've only
really seen in philosophical settings.

A uniformly chosen 64-bit integer still has exactly 64 bits of entropy,
regardless of how much Kolmogorov complexity the actual bits you generate
have.

------
beambot
Reminds me of the Von Neumann method of using a biased coin to generate
unbiased random coin flips:
[http://web.eecs.umich.edu/~qstout/abs/AnnProb84.html](http://web.eecs.umich.edu/~qstout/abs/AnnProb84.html)

(Edit: not the algo itself, just the notion of combining randomness.)

~~~
cvwright
Cool. But note that this is not a problem that you're likely to run into in
practice. Nolan and Gelman claim that you can't actually bias a coin.

[http://www.stat.berkeley.edu/~nolan/Papers/dice.pdf](http://www.stat.berkeley.edu/~nolan/Papers/dice.pdf)

~~~
paulgerhardt
How to create an unfair coin and prove it with math:
[https://izbicki.me/blog/how-to-create-an-unfair-coin-and-
pro...](https://izbicki.me/blog/how-to-create-an-unfair-coin-and-prove-it-
with-math.html)

~~~
babuskov
> Amazingly, it takes some pretty big bends to make a biased coin. It’s not
> until coin 3, which has an almost 90 degree bend that we can say with any
> confidence that the coin is biased at all.

Basically, it proves that creating a biased coin really is impossible.

~~~
andreasvc
It is impossible by bending. It's probably still possible (not to say
practical) by crafting a coin with a hidden compartment of a heavier metal.

~~~
babuskov
The paper states that this would only alter the center of mass. If you catch
the coin or it lands on a soft surface, it has no effect, because the outcome
is determined by the time when it stops. If you allow it to bounce, then yes.

TL;DR: if you want unbiased coin IRL, make sure you catch it before it hits
the ground.

------
Dagwoodie
What makes randomness so hard? I had this crazy thought awhile back and
wondering if it would work out:

Say you took a small disk shaped object like a hockey puck with a window on it
and you filled it with sand. 50% white sand and 50% black sand. Inside the
puck would be blades that are attached to a motor and rotated slowly to
constantly change the pattern. The pattern formed in the window would be truly
random wouldn't it? You could mount this to a PCIE card with a camera...

~~~
cjslep
People that want seriously random numbers use radioactive decay because the
underlying physical phenomena (described via quantum mechanics) is
fundamentally random and cannot predict the time, energy, and direction of
decay all at the same time (as far as my limited understanding is aware).

~~~
marcosdumay
Diode avalanche current has the same fundamental randomness (predicted by QM),
and does not require radioactive elements.

Also, it integrates very well. The smaller the diode, the better randomness
you'll get.

~~~
spacemanmatt
I've heard you can just sample a noisy resistor, too.

~~~
marcosdumay
Resistors do not have that nice property that a single particle can start a
macroscopic cascade of events.

But, yes, with enough precision a resistor would do.

------
deckar01
> Abstract:

> We explicitly construct an extractor for two independent sources on n bits,
> each with min-entropy at least logCn for a large enough constant~C. Our
> extractor outputs one bit and has error n−(1). The best previous extractor,
> by Bourgain, required each source to have min-entropy 499n.

> A key ingredient in our construction is an explicit construction of a
> monotone, almost-balanced boolean function on n bits that is resilient to
> coalitions of size n1−, for any 0. In fact, our construction is stronger in
> that it gives an explicit extractor for a generalization of non-oblivious
> bit-fixing sources on n bits, where some unknown n−q bits are chosen almost
> \polylog(n)-wise independently, and the remaining q=n1− bits are chosen by
> an adversary as an arbitrary function of the n−q bits. The best previous
> construction, by Viola, achieved q=n12− .

> Our explicit two-source extractor directly implies an explicit construction
> of a 2(loglogN)O(1)-Ramsey graph over N vertices, improving bounds obtained
> by Barak et al. and matching independent work by Cohen.

[http://eccc.hpi-web.de/report/2015/119/](http://eccc.hpi-
web.de/report/2015/119/)

------
dave2000
What is the possibility that this is an attack on cryptography; convince
people that it's safe to produce random numbers this way using an inaccurate
"proof" and then have an easy/easier time decrypting stuff produced by anyone
who uses it?

~~~
swordswinger12
Exactly zero. The authors are well-known theory researchers at a major
university, not NSA double-agents.

Also, this paper was peer-reviewed and published at one of the top theory
conferences in the field. This doesn't guarantee the proof is correct, but it
means it received a certain level of scrutiny during the review process.

Also also, this paper being public (and high-profile) means that the
probability of some mythical 'bug' in the proof remaining undiscovered for
long enough for the technique to be applied to real systems is exactly zero.

~~~
dave2000
"The authors are well-known theory researchers at a major university, not NSA
double-agents."

I don't understand. Isn't this like the family of someone accused of spying
saying "he's not a spy, he's a teacher"?

~~~
swordswinger12
Dave Zuckerman has been doing theoretical CS research for about twenty-five
years. Are you saying you think it's likely that either (a) he was an NSA
double agent this whole time, or (b) he recently started doing clandestine
work for the NSA inserting backdoors into abstract theoretical results about
Ramsey graphs and two-source extractors?

~~~
dave2000
I'm not saying anything about this particular person; just amused as to
people's attitudes towards people who might be engaged in secret work on
behalf of a government, as if detecting such a person is straightforward, or
that people have "spy" on their passport, etc. If you look at the history of
spying, leakers and double agents you can see people do it for all sorts of
reasons; money, blackmail, belief that your country is doing something wrong
or that you need to help your country defeat another country's ideology etc.

~~~
mywittyname
I think the general consensus is that it doesn't matter whether or not he is
some clandestine NSA agent because his paper is just a theoretical proof that
is almost entirely removed from any implementations based on his work. If
there is some fundamental flaw in his work, then it's likely to be discovered
before RNGs based on this work come into wide-spread use.

It would be much easier to just code a flaw into the actual implementations of
RNGs based on this.

------
wfunction
Could someone explain why XORing the outputs of the two sources isn't optimal?

~~~
tlb
Suppose both sources are 90% zeros. Their XOR will be mostly (82%) zeros too.

~~~
caf
And in terms of Shannon entropy, the original two bits have 0.469 bits of
entropy each, and the resulting XORd bit has 0.68 bits of entropy. So this
extractor is far from ideal.

~~~
caf
(Actually for the "90% zero" sources, OR would be a slightly better extractor
than XOR, yielding a resulting bit with 0.70 bits of entropy).

------
jaunkst
I have always wondered why not introduce physical randomness into
cryptography. Let's take scalability out of the question and look at the
problem at the fundamental level. If we used a box of sand that shifted each
time a random number was requested and a camera to scan and produce a number
from this source would it not more random than any other method? I'm not a
professional in this field I am just truly asking why not..

~~~
sobellian
New Intel chips use a special bistable circuit to generate random bits in
hardware, but many people don't seem to trust it (talk of back doors and
such).

Frankly, it seems to me that if you're worrying about the intentions of your
CPU manufacturer, you're screwed anyway. I guess that I also haven't put that
much thought into it though.

~~~
im3w1l
Most kinds of CPU misbehavior are detectable: "-Did it add the two numbers
correctly? -No it did not" Randomness is different: "-Did it generate a random
number? -Well it generated some kind of number...."

------
Cieplak
Does this imply that XORing /dev/urandom with /dev/random is a good practice?

PS: Thanks for clarifying @gizmo686. The arch linux wiki suggests that urandom
re-uses the entropy pool that dev/random accumulates, so this is indeed a
_BAD_ idea.

I found this helpful as well:

    
    
        https://en.wikipedia.org/wiki/Randomness_extractor
    

Overall, their construction quite reminds me of a double pendulum, which is
one of the simplest examples of deterministic chaos.

~~~
cyphar
/dev/urandom and /dev/random are identical in every way, except that
/dev/random blocks when the "entropy estimator" decides there "isn't enough
entropy". However, due to the properties of the CSPRNG they use, such
estimates have dubious value. Overall, you should always use /dev/urandom.

But definitely don't XOR the two.

~~~
Bromskloss
Why does /dev/urandom use up the nice entropy of /dev/random when it doesn't
provide any guarantees anyway?

Also, do you mean that/dev/urandom should be used even for cryptographic
applications?

~~~
cyphar
What do you mean by "use up the nice entropy"? You can use /dev/urandom for
cryptographic applications (in fact _you should_).

Here's a nice article about it: [http://www.2uo.de/myths-about-
urandom/](http://www.2uo.de/myths-about-urandom/)

~~~
Bromskloss
I shall ready your link in full at a later time. For now, is there any problem
with using /dev/random other than that it is blocking?

------
marshray
How is this different than taking two independent bits with < 1 bit entropy
and XORing them together to combine their entropy? (up to a max of 1 full bit)

~~~
eru
For one, XOR doesn't remove bias. (If both sources are biased, so will be
their XOR.)

------
Houshalter
I read the article and the comments and I'm still confused why this is
important.

I mean it sounds trivial. Why not take the hash of the first random number,
and xor it with the first random number. Then optionally hash the output and
use that as a seed for a RNG. If any part of the process isn't very random,
that's fine, it's still nearly impossible to reverse and doesn't hurt the
other parts.

~~~
coldtea
Hashing low entropy random sources is not going to get you high quality random
seeds...

~~~
Houshalter
Why not? Xor it with a random constant if it's sensitive to zeros. But you
aren't going to get any more entropy out of it than any other method.

------
csense
"...if you have two low-quality random sources...you can combine them in a way
to produce a high-quality random number..."

I tried to skim the paper, but it's really dense. Can someone who understands
it explain how what they did is different than the obvious approach of running
inputs from the two sources through a cryptographically strong hash function?

~~~
pbsd
The most notable thing is the lack of assumptions. You don't have to assume
anything other than the min-entropy of the sources. With a hash function, you
generally need to assume that the hash function is a good extractor, instead
of showing that it is one. Instead, what we have here is an extractor built up
from first principles.

Also, note that you need at least two independent sources. This is because
with a single source you just can't have a good extractor for _any_ imperfect
source. For example, imagine you are using a hash function to extract one
uniform bit from an n-bit input. If the input source is the set of all n-bit
strings such that H(x) = 0, which has pretty high min-entropy, you end up with
a 'randomness extractor' that is just a constant function. This is highly
contrived, but that is what you have to work with when you say "any" in
theorem statements.

To fix this, you need at least two sources, where one source 'keys' the other,
and therefore as long as the sources aren't working together (i.e., they are
independent), there is nothing they can do to sabotage our randomness
extraction. The inner product is a simple example of an extractor that works
as long as each source has at least n/2 min-entropy. I have no idea what the
construction of this new paper looks like, but it's an improvement on this
min-entropy required. However, since this improvement is asymptotic, it's
unclear whether it is any useful at all for realistic ranges of length and
min-entropy.

In reality, this is entirely irrelevant for practical purposes. The work-horse
tools of randomness extraction in practice are hash functions, block ciphers,
and universal hashes, so this new paper is interesting from a theoretical
point of view only. Yet another example of university PR departments being
insultingly misleading in their press releases.

------
kovvy
How well does this handle a biased source of random numbers in one or more of
the inputs? If someone has set up your random number source to be more easily
exploitable (or just done a really bad job setting it up), does combining it
with another poor source with this approach mean the results are still useful?

------
wfunction
Isn't "Independent and no correlations" redundant? How can two random
variables be independent but correlated?

~~~
cjslep
Dependence is stricter than correlation: the _population_ violates
probabilistic independence. Correlation can help guide statisticians to
finding a dependence between variables, because correlation measures how close
or how far a _sample_ is to independence. But this gives rise to the
"Correlation does not imply causation" adage.

Example of independent but correlated variables:
[http://www.tylervigen.com/spurious-
correlations](http://www.tylervigen.com/spurious-correlations)

------
nullc
But can anyone extract the algorithm from the paper?

:)

------
mirekrusin
Can someone explain why it's considered so hard to get randomness? I mean you
can take old radio and you hear random noise, is it hard to create tiny
antenna in the computer?

------
bootload
another article via UT (Uni. Texas), _" New Method of Producing Random Numbers
Could Improve Cybersecurity"_ ~ [http://news.utexas.edu/2016/05/16/computer-
science-advance-c...](http://news.utexas.edu/2016/05/16/computer-science-
advance-could-improve-cybersecurity)

------
Bromskloss
> A source X on n bits is said to have min-entropy at least k if

Can a rigorous definition of "source" be found somewhere?

~~~
sn41
I think it is a random variable X whose support is a subset of the set of
n-length strings.

\- i.e. if X(w) > 0, then w has length n.

------
nullc
But can anyone extract an algorithm from the paper? :)

------
roschdal
“We show that if you have two low-quality random sources—lower quality sources
are much easier to come by—two sources that are independent and have no
correlations between them, you can combine them in a way to produce a high-
quality random number,”

So Math.random() * Math.random() ? :)

~~~
plainOldText
Perhaps Math.random()@computer1 and Math.random()@computer2

~~~
willvarfar
As long as you ensure they are different algos and have different seeds...?

As I write that I get a funny "no way" feeling. This is so unnatural feeling,
if Its true it is indeed a breakthrough. I have to read the paper again.

When people complained about Linux mixing in the Intel hardware rng Linus
replied that mixing low quality sources have good entropy - and he got a lot
of stick for that.

That they are on different computers is immaterial.

~~~
merijnv
Linus actually argued something slightly different. That discussion was about
whether a "bad" random source can weaken a "good" source. This paper is about
constructing a "good" source from two independent "bad" sources.

The problem with Linux was people complaining that mixing the Intel RNG into
the existing entropy pool would lower the entropy of the entire pool,
effectively letting a backdoored Intel RNG render the random system
untrustworthy. Linus' response was the fairly simple/logical observation that
"that's not how entropy works".

Suppose we have a random bit string with entropy X and we xor the bit string
with a completely deterministic bitstring, what's the result? Clearly you have
a bitstring that STILL has entropy X, because the result is completely
dependent on our initial bitstring. Mixing with a deterministic bitstring
doesn't make the output of our XOR any more predictable then the original
bitstring, it's equally random.

So Linus' argument was that "IF the Intel RNG is completely backdoored and
deterministic mixing it with the entropy pool will have no effect on the
entropy in the pool. HOWEVER, if the Intel RNG is anything but completely
deterministic, i.e. there is even a tiny bit of randomness in it, this will
actually INCREASE the pool's entropy.

So mixing a completely backdoored RNG will have no negative impact, but mixing
anything that's not 100% predictable will have a positive impact, so there's
no reason to not always mix the hardware RNG with the pool.

~~~
Natanael_L
That's also assuming it isn't maliciously correlated to produce an output that
after XOR leaks secret entropy.

~~~
merijnv
Sure, if you're hardware RNG is inspecting your RAM and trying to compromise
your entropy pool that could be done, but:

1 - The entropy mixing is more complex than simply XOR, making such a thing
considerably harder

2 - If you expect this level of backdooring from your CPU, you have bigger
problems :)

------
ninjakeyboard
praise RNGesus!

