
Better Random Number Generation for OpenSSL, Libc, and Linux Mainline - gbrown_
https://aws.amazon.com/blogs/opensource/better-random-number-generation-for-openssl-libc-and-linux-mainline/
======
colmmacc
If you work on crypto, or any application where sensitive data is handled,
then this short post on the WIPEONFORK option may be interesting:

[http://www.metzdowd.com/pipermail/cryptography/2017-November...](http://www.metzdowd.com/pipermail/cryptography/2017-November/033133.html)

~~~
throwaway613834
How does this work with heap allocation? When you free memory and then
allocate it again and get back the same address range, wouldn't the flag still
apply to the new block?

~~~
colmmacc
madvise() works at the page level and if you're using madvise(), then you're
expected to be using mmap() to allocate pages explicitly. It would be unusual
and a bad idea to start madvise()ing memory that's being handled by an
allocator (like malloc).

Of course you could also write a security focused heap allocator that uses
these madvise() options under the hood.

------
jedberg
Before I clicked I was expecting this to be a "random numbers as a service"
announcement, based on lava lamps or something.

And yes I know CloudFlare did the lava lamps thing, like a bunch of other
people did it before them. But it would be totally inline with Amazon to have
a random numbers as a service service to compete with random.org.

~~~
colmmacc
[http://docs.aws.amazon.com/cli/latest/reference/kms/generate...](http://docs.aws.amazon.com/cli/latest/reference/kms/generate-
random.html)

:)

~~~
jedberg
Right, but that doesn't use _lava lamps_ or _background atmospheric
disturbances_. :)

It said "better random numbers" so I was hoping for something better than KMS
random numbers.

~~~
tankenmate
I think the real value of this software is the fact that it has been proven to
perform as designed, both theoretically and in implementation.

------
bogomipz
I had a question about this passage:

>" Firstly, as mentioned, s2n uses the so-called “Prediction resistant” mode,
which means that s2n feeds the random number generator with more entropy (from
hardware) on every call, as long as the right hardware is available"

Would this be the RDRAND instruction or something else?

~~~
colmmacc
We use RDRAND if it's available, and we also have our own hardware entropy
generators in some systems. In either case, we use this entropy as additional
input that provides prediction resistance, and it's fed into the AES_CTR
generator. Here's a picture from the blog post:

[https://d2908q01vomqb2.cloudfront.net/ca3512f4dfa95a03169c5a...](https://d2908q01vomqb2.cloudfront.net/ca3512f4dfa95a03169c5a670a4c91a19b3077b4/2017/11/22/s2ndrbg-1024x507.png)

Basically when the DRBG is initialized we start with /dev/urandom and the
personalization string. That sets up the initial state, and then hardware
entropy is "mixed in" every time the generator is used. Because the mixing in
happens before AES is applied, that means that even if either stream of
entropy is biased, AES will randomize it anyway (or else AES is a broken
cipher!). That means that even if you controlled both inputs, it would still
require at least as much computational work as it takes to compute AES to be
able to generate a predictable output. That's a nice property of the DRBG
construction.

~~~
bogomipz
Thanks for the clarification and detailed response. Cheers!

------
VeejayRampay
Glad to see some open source work benefiting everyone, especially coming from
the likes of Amazon.

------
NelsonMinar
This bit is an aside but it's interesting to me: "Earlier this year, with
Galois, we completed a proof of our implementation of AES_CTR_DRBG, and
formally verified that our code in s2n is equivalent to the written
specification." That's amazing! I think it may be the first bit of provably
correct code that I'm likely to actually use in practice.

------
jabl
> Recently, both the OpenSSL and glibc projects have been looking to replace
> their random number generators. They, too, are going with AES_CTR_DRBG,
> based in some part on the work in s2n and the availability of formal
> verifications that can be applied to code. That’s pretty sweet.

I tried to search for this csprng in glibc, but I wasn't able to find
anything. Can anyone provide more info?

------
revelation
So we mark our super secret entropy memory area WIPEONFORK. Now we fork and
our super secret entropy memory area feeding all RNG is all zeroes. We've
implemented a Sony RNG!

So we still need to detect forks. Not sure this is entirely the silver bullet
this makes it out to be. Maybe fork() is just a bad idea, for any number of
reasons.

~~~
colmmacc
bmm6o's reply captures the approach; you WIPEONFORK a guard variable, and use
the state of that variable to re-initialize.

When we're writing a crypto library, like s2n, or OpenSSL, we don't get a
choice to avoid fork() ... the application can fork() any time it wants and we
just have to be able to deal with it. There have been a few approaches:

* Always feed new entropy on every call (but this isn't always available)

* Use getpid() to detect a fork() ... but this can break because some versions of libc cache the value of getpid(). Another problem is the grand-child problem, where an application can fork(), have have the child start a new process group, then the parent exits, then the child fork()s again ... in that situation there's some chance that the grandchild ends up with the same PID as the original process that started everything (and may have used the RNG).

* To fix that, one clever approach that BoringSSL used for a while was to check both getpid() and getppid(), which signaficantly reduced the likelihood of occurrence but was still probabilistic.

* All sorts of weird clone() flags, that things like language runtimes and virtual machines use, can bypass pthread_atfork and PID changes.

~~~
revelation
What's wrong with setting it DONTFORK and handling the page fault in the
process? Someone can still prevent you from correctly handling the fault in
the new process, but at least that keeps the RNG in an obvious broken state.

Right now, the <4.14 fallback seems strictly worse.

(Disregarding that this is mostly academic, and that whoever is calling fork()
while using your library is unlikely to have spent the same attention to
detail when it comes to stuff like ephemeral keys..)

~~~
colmmacc
I couldn't make this work reliably ... the kinds of applications that call
clone() directly and bypass the pthreads stuff are language runtimes and VMs
that want to emulate their own lightweight processes. They also have their own
page fault handling, and documentation says that behavior is undefined when
these things start interacting recursively. It's very difficult for a library
to exert control.

At a same time these environments often use native crypto that's implemented
in C, for performance, and so it all comes together. Of course the other
mitigations present still work, and we're talking about a very very obscure
and unlikely set of circumstances, but why leave even a tiny door open.

