
Linux /dev/urandom and concurrency - drsnyder
http://drsnyder.us/2014/04/16/linux-dev-urandom-and-concurrency.html
======
kijin
If your program needs to read 4K from /dev/urandom multiple times per second,
you're doing it wrong. There is little benefit in reading anything over 32
bytes at a time.

According to the man page for /dev/random and /dev/urandom:

> _no cryptographic primitive available today can hope to promise more than
> 256 bits of security, so if any program reads more than 256 bits (32 bytes)
> from the kernel random pool per invocation, or per reasonable reseed
> interval (not less than one minute), that should be taken as a sign that its
> cryptography is not skillfully implemented._

------
Glyptodon
So why is there a lock for reads from urandom? I suppose if there weren't a
lock concurrent reads would all get the same random values?

~~~
bodyfour
Yeah basically. That could be a disaster for, say, nonce generation.

The solution would be to have multiple independent entropy pools and either
bind them to cores(/sets of cores) or pick a non-busy one in a contention
case.

~~~
acqq
Yes, if there is no a urandom generator per core, it would be convenient for
some extreme cases to introduce such. The question is if it's worth the effort
and the resulted "bloat" of the kernel code and memory usage. Linux runs on
some very small devices too and even there decent user-space programmers can
easily do their own per-thread generation in their programs. Normal uses of
crypto are such: you initialize your own crypto once, then produce a lot of
data in your own space.

If urandom is really "one for all cores" somebody should be able to
demonstrate the speed drop by just writing some bash script? Volunteers?

~~~
claudius
It seems to work in part. For /dev/urandom, I see always roughly the same
throughput:

    
    
      $ time dd if=/dev/urandom of=/dev/null bs=1 count=10000000
      real	0m10.640s
      user	0m0.696s
      sys	0m9.940s
    
      $ time (for i in $(seq 1 50); do dd if=/dev/urandom of=/dev/null bs=1 count=200000 2>/dev/null & done; wait)
      real	0m11.199s
      user	0m1.232s
      sys	0m42.828s
    
      $ time (for i in $(seq 1 500); do dd if=/dev/urandom of=/dev/null bs=1 count=20000 2>/dev/null & done; wait)
      real	0m11.234s
      user	0m1.252s
      sys	0m42.536s
    

whereas for /dev/zero:

    
    
      $ time dd if=/dev/zero of=/dev/null bs=1 count=10000000
      real	0m3.268s
      user	0m0.660s
      sys	0m2.604s
    
      $ time (for i in $(seq 1 50); do dd if=/dev/zero of=/dev/null bs=1 count=200000 2>/dev/null & done; wait)
      real	0m2.550s
      user	0m1.192s
      sys	0m8.760s
    
      $ time (for i in $(seq 1 500); do dd if=/dev/zero of=/dev/null bs=1 count=20000 2>/dev/null & done; wait)
      real	0m2.612s
      user	0m1.228s
      sys	0m8.112s
    

Of course, the bash for-loop here together with the forking has some
considerable overhead, so these values should likely be interpreted carefully
(Linux 3.14-rc7, Core i5 520M).

------
sebcat
As a user of libcares (which is awesome for bulk DNS lookups btw) I'll add
that I've only ever needed one ares_channel per process. Having one
ares_channel for every CURL-handle seems a bit excessive. This is probably the
main problem here, not the kernel spinlock.

Edit: Come to think about it, why isn't the CURL-handle reused? Sounds like a
new CURL-handle is inited for every request, which I don't recall being
necessary.

~~~
drsnyder
The curl handle should be re-used if possible so that's also part of the
problem.

------
Mister_Snuggles
A more important question would be "Why does asynchronous DNS resolution
require random data in the first place?"

~~~
mike-cardwell
So you can randomise the ID in the request packet to help protect against
cache poisoning. And also so you can apply 0x20 bit (x) encoding to the qname
for further protection.

(x)
[http://courses.isi.jhu.edu/netsec/papers/increased_dns_resis...](http://courses.isi.jhu.edu/netsec/papers/increased_dns_resistance.pdf)

~~~
bch
Hard to say w/o seeing the data in question, but based on that, perhaps nscd
or re-using curl handles could mitigate their frustration w/ runtime.

~~~
frankfarmer
the c-ares init (which reads /dev/urandom) inside curl init happens even when
DNS isn't used at all (even when making a request to 127.0.0.1), so it's
pretty hard to avoid as long as curl is built with c-ares. The only way to
mitigate is to remove c-ares or limit calls to curl init.

~~~
acqq
Isn't the solution exactly what is described here:

[http://curl.haxx.se/libcurl/c/curl_easy_init.html](http://curl.haxx.se/libcurl/c/curl_easy_init.html)

"If you did not already call curl_global_init(3), curl_easy_init(3) does it
automatically. This may be lethal in multi-threaded cases, since
curl_global_init(3) is not thread-safe, and it may result in resource problems
because there is no corresponding cleanup."

I can imagine that only curl_global_init reads from urandom? Your application
should do curl_global_init only once, then do other fetches each time using
just curl_easy_init and cleanup.

~~~
frankfarmer
We're using libcurl via PHP (I know, I know) which doesn't expose
curl_global_init at all.

Here's the stacktrace for the /dev/urandom read -- it's happening in
curl_easy_init.

    
    
       Catchpoint 1 (call to syscall 'ioctl'), 0x0000003a74ecc4ba in tcgetattr () from /lib64/libc.so.6
       (gdb) backtrace
       #0  0x0000003a74ecc4ba in tcgetattr () from /lib64/libc.so.6
       #1  0x0000003a74ec7a1c in isatty () from /lib64/libc.so.6
       #2  0x0000003a74e60d51 in _IO_file_doallocate_internal () from /lib64/libc.so.6
       #3  0x0000003a74e6d6dc in _IO_doallocbuf_internal () from /lib64/libc.so.6
       #4  0x0000003a74e6ba7c in _IO_file_xsgetn_internal () from /lib64/libc.so.6
       #5  0x0000003a74e61dd2 in fread () from /lib64/libc.so.6
       #6  0x0000003341606414 in ares_init_options () from /usr/lib64/libcares.so.2
       #7  0x0000003d3404f0c9 in ?? () from /usr/lib64/libcurl.so.4
       #8  0x0000003d340242a5 in ?? () from /usr/lib64/libcurl.so.4
       #9  0x0000003d3402f9a6 in curl_easy_init () from /usr/lib64/libcurl.so.4
       #10 0x00002b35e0304fb0 in ?? () from /usr/lib64/php/modules/curl.so
       #11 0x0000000000606da9 in ?? ()
       #12 0x00000000006456b8 in execute_ex ()
       #13 0x00000000005d2bba in zend_execute_scripts ()
       #14 0x00000000005769ee in php_execute_script ()
       #15 0x000000000067e44d in ?? ()
       #16 0x000000000067ede8 in ?? ()
       #17 0x0000003a74e1d994 in __libc_start_main () from /lib64/libc.so.6
       #18 0x0000000000422b09 in _start ()
       (gdb)

~~~
acqq
I just guess, but maybe the things missing in the stack trace (marked with ??
-- maybe you miss some debug symbols?) do what is described in the manual
entry I've quoted (calling global init which calls ares?).

And why curl at all? PHP has built in HTTPRequest?

~~~
ZoF
HTTPRequest isn't technically built in to PHP.

I agree that they aren't using the correct technology for their use case
though.

~~~
acqq
Yes, being a C programmer I knew curl is overkill, but not what's the best
lightwieght alternative, now I believe it's:

[http://at2.php.net/stream_socket_client](http://at2.php.net/stream_socket_client)

------
aidenn0
Seed a secure userspace PRNG from urandom, perhaps?

~~~
hosay123
Adding to aidenn0's comment, if you trust /dev/urandom to produce 4kb of
random data, it follows that you trust it to produce 128 bits.

128 bits (32 bytes) is sufficient to initialize a PRNG into any one of
115792089237316195423570985008687907853269984665640564039457584007913129639936
states (that's 1 with 77 digits). Consequently, hitting the kernel constantly
for so much data is utterly inefficient in the first instance, and totally
unnecessary in the second.

Blog author could improve his design's efficiency >128x just by seeding a PRNG
with a single 32 byte read at the start of the subprocess

~~~
mcpherrinm
Userland PRNGs are one of the easiest ways to introduce security
vulnerabilities into your programs. I would recommend being very, VERY careful
before trying to do this, like the traditional "Don't roll your own crypto"
advice.

~~~
raverbashing
Of course. But there are a lot of needs for random numbers that don't need the
random numbers to be secure.

~~~
azinman2
In which case rand and the like really should be renamed unsecure_random to
prevent confusion.

~~~
raverbashing
Fine by me

------
mike-cardwell
Interestingly enough, I have actually been working on writing a DNS client
library in C++ with Boost ASIO this very afternoon. I was going to get my
source of random data using the following C++11 standard library code. I would
really appreciate any comments from people here if there is anything wrong
with what I'm doing:

    
    
      #include <random>
      std::uniform_int_distribution<uint32_t> dist;
    
      // Seed a Mersenne twister PRNG with random data:
      std::mt19937 eng;
      std::random_device rd;
      eng.seed(dist(rd));
    
      // Now to generate random numbers, simply:
      uint32_t random_number = dist(eng);

~~~
aidenn0
I don't know what DNS uses the randomness for, but if a malicious attacker can
gain from guessing the randomness, don't use MT, as the state can be extracted
from MT by observing a relatively small number of outputs.

~~~
mike-cardwell
Ah. You appear to be right. I'm glad I asked now.

[edit] I'm going to skip using the Mersenne twister engine and just use
std::random_device for all random data, instead of as a seed. It seems on
Linux at least that random_device is basically /dev/urandom. I assume the
source will be sane on other OS's too.

~~~
pbsd
The C++ standard gives little guarantees for the quality of
std::random_device. As I remember, MinGW on Windows uses the Mersenne Twister
with a hardcoded seed for it.

boost::random_device, on the other hand, has better guarantees: it is only
implemented where there's a decent entropy source.

~~~
mike-cardwell
Thanks for the info. Seeing as I'm already using boost, that sounds very
useful.

------
rcoh
The stdlib rand() function on unix has a global lock around it provided by
many versions of Linux. As such, if rand() is called in performance critical
parallel code, performance will tank as each thread or process attempts to
acquire this lock. Even if this lock is not acquired, you will still have a
race condition on the state of the random number generator and may produce bad
(non-random) randomness.

Use rand_r(unsigned int *state) instead in parallel and concurrent
applications.

Sources: man 3 rand [unix command] [http://unixhelp.ed.ac.uk/CGI/man-
cgi?rand+3](http://unixhelp.ed.ac.uk/CGI/man-cgi?rand+3)

~~~
ekimekim
The problem you're describing is similar but not the same as the one in the
article. What you describe is part of the libc implementation of rand(3),
whereas the article is talking about reads from /dev/urandom, which has a lock
inside the kernel code (for the same reasons as libc).

------
X-Istence
I love how Theodore Ts'o suggests using a user space PRNG that is seeded from
/dev/urandom. OpenBSD are ripping out all of the user space PRNG stuff from
OpenSSL in favour of arc4random_buf()...

~~~
clarry
arc4random_buf() operates in userspace (in this case; it also exists in the
kernel). It is seeded from the kernel, using a sysctl.

------
ape4
Why does he need so much pseudorandomness. And why use /dev/urandom directly.
Maybe using the random library from the programming environment would make
more sense.

~~~
frankfarmer
Simply initializing a curl handle causes the /dev/urandom read -- so a large
number of parallel curl requests easily triggers this issue.

~~~
ape4
Thanks for the reply.

------
rafekett
overreliance on /dev/urandom in the presence of little entropy is a well known
performance problem on servers. that's why
[http://en.wikipedia.org/wiki/Hardware_random_number_generato...](http://en.wikipedia.org/wiki/Hardware_random_number_generator)
exist

~~~
claudius
If I understand that problem correctly, it has nothing to do with the amount
of entropy available but is a simple synchronisation/locking issue. Were reads
from, say, /dev/zero ‘protected’ by spinlocks in the same way, the same issue
would arise. Conversely, I don’t see how adding a hardware RNG to the system
could alleviate the locking issue.

------
bcl
The code he pointed to is for kernel 2.6.18 which at this point could be
considered ancient history. If you look at current master -
[https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux....](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/char/random.c?id=refs/tags/v3.15-rc1#n1365)

it looks like it has been re-factored somewhat, although the lock is still in
there.

