Hacker News new | comments | show | ask | jobs | submit login
Myths about /dev/urandom (2uo.de)
186 points by petrosagg 779 days ago | hide | past | web | 106 comments | favorite

Why hasn't someone qualified re-written that unix man page by now? I've been reading cryptographers trying to explain all of this for a while now. I feel like that would help put a lot of this to rest and save everyone time.

It would have saved me a lot of time, that's for sure. And here I thought I was being responsible and informed by reading the man page.

Honestly, anyone who cares & can understand will look at the implementation of the two and grok it. Those who don't will screw up the rest of it anyway, and that will create more than enough trouble even with some helpful guidance. Best to have them fail in pretty obvious ways.

I would not classify the /dev/random failure modes as "pretty obvious".

I guess it is a matter of perspective. If you understand what you need for your cryptography, you'll know if you want /dev/random or not. The fact that it blocks is very, very well advertised and clear. The fact that the difference is that /dev/urandom won't block is also very, very clear. If you don't know which you want, I'd wager you don't understand your cryptographic protocol's needs terribly well, and are best off outsourcing that work to someone else.

Last time I tried using gpg on a VM it failed to work (literally would not do anything) because it blocked on /dev/random. Would you say that the gpg people should be outsourcing their work to someone else?

Things have improved a little on this front. It turn out gnupg was being a little gluttonous when it came to entropy:

"Daniel Kahn Gillmor observed that GnuPG reads 300 bytes from /dev/random when it generates a long-term key, which, he observed, is a lot given /dev/random's limited entropy . Werner explained that GnuPG has always done this. In particular, GnuPG maintains a 600-byte persistent seed file and every time a key is generated it stirs in an additional 300 bytes. Daniel pointed out an interesting blog post by DJB explaining that a proper CSPRNG should never need more than about 32 bytes of entropy. Peter Gutmann chimed in and noted that a 2048-bit RSA key needs about about 103 bits of entropy and a 4096-bit RSA key needs about 142 bits, but, in practice, 128-bits is enough. Based on this, Werner proposed a patch for Libgcrypt that reduces the amount of seeding to just 128-bits."[^1]

On a related not why are you generating keys on a remote vm? Its probably not fair to say that gpg "failed to work (literally would not do anything)." It was doing something, gnupg was waiting for more entropy. Needing immediate access to cryptographic keys that you just generated on a recently spun up remote VM is kind of a strange use case?

[^1]: https://www.gnupg.org/blog/20150607-gnupg-in-may.html

Thanks very much for the updates on entropy requirements.

Re: "Why are you generating keys on a remote VM" - prior to this, it hadn't occured to me I couldn't generate a gpg key on linode/digital ocean, VMs. I realize now that keys should be generated on local laptops (or such), and copied up.

Re: "Fair to say failed to work" - It just sat their for 90+ minutes - I spent a couple hours researching, and found a bug (highly voted on) that other people had run into the same issue. But, honestly - don't you think that gpg just hanging for 90+ minutes for something like generating a 2048 bit RSA key should be considered, "failing to work?" - I realize under the covers (now) what was happening - but 99% of the naive gpg using population would just give up in the same scenario instead of trying to debug it.

Yeah, the bug was really how it handled the case of waiting forever without telling you why. In GPG's defense, before it actually stars reading from /dev/random, it does give you all kinds of warnings that it needs sources of entropy before it can make any progress.

Hard to get that kind of thing right, but fundamentally it did stop you from making exactly the kind of terrible mistake that I was talking about. ;-)

On environments with limited initial entropy (such as VMs) just use haveged [1]

[1] - https://wiki.archlinux.org/index.php/Haveged

Whoa. I didn't realize it was reading 300 bytes. That is excessive.

No, they should outsource their UX work to someone else.

They know their crypto requires good entropy. GPG should timeout or give a warning/option to shoot yourself in the foot in case you wanted to. But that's UX, not crypto.

I thought the entire purpose of this thread was pointing out that /dev/urandom is plenty fine for security purposes, and all blocking on /dev/random does is, well, block your program for no particularly good reason.

/dev/urandom is fine for a lot of contexts. If you were using it for key generation (which I believe is when it reads from /dev/random), particularly inside a VM, that might be when you have a case that you have a need for a small seed that can't controlled by the VM environment. You don't want two instances of the VM picking the same exact persistent keys.

As the article says, it's a screw up to not have each VM instance seeded separately. You need that to not muck the whole thing up. In the case of generating keys for permanent use, you want to fail in that case, at least until some entropy shows up, as it did.

If you have two VMs with identical /dev/urandom states, that's a grievous vulnerability that will impact far more than GPG. This doesn't seem like a good reason to use /dev/random, but rather a reason to fix whatever distro bug is preventing your (u)random from being seeded properly.

You do not need /dev/random for key generation.

In addition, if two VMs have the same urandom state, that means they must have the same random state. So now you are hoping that somehow those states diverge? But given that they started the same, I'd be wary of making that assumption. There's no guarantee that whatever "entropy" unblocks random on one machine won't be identical to the second machine.

You don't need it, but if you don't have enough entropy for key generation, something is wrong.

If that's snark, you've missed my point. If urandom on your system is insecure, all sorts of other software on that system is also insecure.

Yes, so the question is what is the preferred failure behaviour when there isn't even enough entropy in the system to be confident that you can select a single unpredictable key with any kind of confidence of uniqueness within an unbounded time window. That should mean that the system effectively has no source of entropy.

I think blocking and not returning until you have entropy is a reasonable failure behaviour for gpg in the key generation process.

It'd be nice to report something and maybe hint to the user that you are waiting for just a modicum of entropy to show up, but at least it isn't presenting a key that is entirely predictable (and worse still, the SAME as any other instance of that VM image!!!) to anyone with access to the original VM.

The bug the article is referring to is that a lot of security systems will block reading /dev/random when in fact /dev/urandom will provide a securely unpredictable sequence of data with no statistically likelihood of another system producing the same sequence. It's particularly bad for the case where timeliness is an important part of the protocol (which is largely a given for anything around say... a TCP connection). That's a silly design flaw.

I think I see what you are saying - that gpg blocking, and failing to create a key on a VM, is actually a desired behavior, and that the only real problem is gpg doesn't time out more quickly and say something like, "System not providing sufficient entropy for key generation".

But, if that's the case, then the entire thesis behind, "Use /dev/urandom" is incorrect. We can't rely on /dev/urandom, because it might not generate sufficiently random data. /dev/random may block, but at least it won't provide insecure sequences of data.

This is kind of annoying, because I was hoping that just using /dev/urandom was sufficient, but apparently there are times when /dev/random is the correct choice, right?

/dev/urandom will generate secure random data. That's what it does.

That was the point of the blog post: if you are using /dev/random as input in to a cryptographic protocol that will stretch that randomness over to many more bytes, WTF do you think /dev/urandom is already doing?

What /dev/urandom might fail to do, and this primarily applies to your specific case of a VM just as it is first launching and setting things up, is generate unpredictable data, and most terribly, it might generate duplicate data, for certain specific use cases where that would just be awful.

I would agree that you got the gist right though: /dev/urandom is usually the right choice, but when it is not, /dev/random is indeed needed. Most people misunderstand the dichotomy as "random" and "guaranteed random", which leads to very, very unfortunate outcomes. Other people misunderstand what they are doing cryptographically and somehow think that some cryptographic algorithm that uses a small key to encrypt vast amounts of data shouldn't have any concerns about insecure, non-random cyclic behaviour, but oddly take a jaundiced eye to /dev/urandom. It basically amounts to "I think you can securely generate a random data stream from a small source of entropy... as long as that belief isn't embodied in something called urandom".

Again, if you don't know the right choice, you should pass the baton to someone who does, because even if you make the right choice, odds favour you doing something terrible that compromises security.

I think the key point you made was this: "What /dev/urandom might fail to do...is generate unpredictable data."

That was not something that I was aware of, thanks.

Argh. No.

So, contrary to the article, you are claiming that there are good reasons to use /dev/random some times?

There are very specific use cases, where the concern is not about entropy in the series (because duh!), but from having a small seed value that is intrinsically not cryptographically secure but unpredictable (not quite the same thing) even in a controlled setup like a the start up of a VM image.

I would agree with the article that the right way to fix the problem is to seed each instance of the image on startup, but that will also avoid you having a problem with it blocking.

That's a special case though, where you aren't in the middle of a cryptographic handshake, you don't have real time constraints, and the fix to the real problem will also mean there is no problem. Don't use it for a network service's source of randomness.

Actually, the preferred and recommended way to get randomness on modern kernels is to use the new getrandom(2) system call, with the flags argument set to zero.


With the flags set to zero, it works like the getentropy(2) system call in OpenBSD. In fact, code that uses getentropy(buf, buflen) can be trivially ported to Linux as getrandom(buf, buflen, 0).

This is all fine and dandy for seeding a CSPRNG, but we still don't actually have a CSPRNG in the C stdlib or glibc. getrandom(2) seems to target exclusively getting entropy for seeding, looking at the man page ("These bytes can be used to seed userspace random number generators or for cryptographic purposes."). And of course, the syscall is very recent, introduced in Linux 3.17, coming after the kernels in Debian stable (jessie), which features 3.16, and CentOS/RHEL7, which features 3.10.

OpenBSD and NetBSD feature a ChaCha-based arc4random; FreeBSD and libbsd still seem stuck with RC4-based arc4random[1]. An equivalent for that is sorely missing on Linux. /dev/urandom requires messing with file descriptors, which you may run out of and may require error handling, plus all kinds of security precautions to make sure you're actually looking at the right /dev/urandom[2].

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=182610 and https://bugs.freedesktop.org/show_bug.cgi?id=85827

[2] http://insanecoding.blogspot.ch/2014/05/a-good-idea-with-bad...

There is an open request to add a posix_random (basically arc4random) to posix http://austingroupbugs.net/view.php?id=859

getrandom(2) uses a CSPRNG under the hood and can be treated as a pre-seeded one.

It's unfortunate that it's so recent, though.

If the right way to use the Linux system call is to use it with flags set to such that it becomes equivalent to the OpenBSD system call,

1) Why does the Linux system call have flags at all?

2) Why does the Linux system call have a different name?

I.e. why not clone the OpenBSD system call exactly with the same name and semantics and not give the userland any flags to shoot self in the foot with?


(you certainly don't have to agree with that, currently all the flags can do is draw from /dev/random instead of /dev/urandom, and do so without blocking, but that's the reasoning behind the interface change and thus the name change)

It's pretty sad that the embedded system problem is exposed to the userland, which isn't a reasonable place to deal with the problem. A better solution would be to require the bootloader to supply the kernel with a seed (as, I gather, OpenBSD deals with the problem) and not support hardware that doesn't have at least one of 1) hardware rng or 2) 256 bits of non-volatile storage for storing a seed across boots available to the bootloader.

Unfortunately at the moment, it's not a safe call that any particular system is actually running a modern enough kernel imo.

getrandom() showed up in 3.17, and the latest CentOS distribution ships with 3.10. And then you've got older deployments.

You could play with autoconf hell of detecting getrandom() and compiling for it if needed, but I couldn't imagine it to be worth the effort at the moment if you're going to be maintaining a branch of fallback code anyway.

Why didn't we get /dev/getrandom?

One point of a direct syscall is that you don't have to open any files. That protects you from file descriptor exhaustion attacks.

Another point is that it still works if you're inside a chroot / jail / container / with no devices visible.

In a way, that's the point of chroot/jail/container: isolate the host filesystem from the guest, so the script that creates it should take /dev into account if it is needed

The point is to isolate access to the host, and single secure random numbers have nothing to do with the host.

The root problem is that cryptography is not usable without some understanding. In the latest days we developed an addiction for "best practices" like "use bcrypt", "use /dev/random" or alike which in the field of cryptography are not enough: understanding is the only possible path. Actually, this is true for most things, it just shows up more obviously in the field of cryptography because of the security concerns. So we should replace "use bcrypt" with "Read Applied Cryptography & other books".

The average guy doesn't have the time to read enough about cryptography to have the sort of knowledge you're wanting. This very article debates an understanding created by incomplete and poor cryptographic knowledge.

If there's anything I've learned by studying cryptography, it's that the average person needs to invest significant resources to become a pseudo-expert in cryptography - all before they can make any meaningful decision.

Applied Cryptography is not a terribly good starting point for cryptographic understanding these days. That's just another facet of difficulty: where does one even start? Do you want a theoretical baseline understanding? Do you want the high-level, quick-and-dirty overview which only gives a summary? (That'll make it hard to make real, informed decisions...)

If we really want to make cryptography accessible to many developers, the best solution is for the cryptographic community to make our libraries and interfaces better. At the same time, there comes a point where a developer has to stop and say "this is beyond what is standardized in ${widely used library}; we need to hire a cryptographer." In that sense, we need to instill a better anti-crypto ethic.

Basically to become an expert cryptographer you need a math degree and ten years of experience, so this is out of question indeed. What I'm referring to is to get enough information in order to understand the big picture: what is a stream cipher, what a block cipher, a cryptographic hash function and its main properties, how many of those primitives are kinda equivalent sometimes and you can use one to create another, the tradeoff between speed and security (and how number of rounds effect the security of crypto building blocks), analyzing simple algorithms in order to really understand why it is so hard for you to create something secure, secure PRNG generation and weak PRNG generation (and how to break a congruential linear generator), algorithms like DH, RSA, basic knowledge on number theory, and so forth. This will not make you an expert, but will give you enough understanding in order to actually undetstand why a rule or a best practice is used and when it is safe or not to break it.

About starting point, this is incredibly sad but true: there is no Applied Cryptography of 2015. The book is at this point in some way outdated and no replacement exists, however what you can do is to read it, and then to read the documents that there are around to get updated information. Also there are now the online courses on cryptography that really help. This may look like an overkill, but at this point crypto is everywhere and is the foundation of most things secure, so it is a requirement of everybody involved with computer security.

Well, this is the point of nacl (and successors). You don't need to know that it's using a stream cipher. You only need to know you want to send a secret, and this is the function that does that.

How secret is it? How much effort do you want an adversary to spend vs. your intended recipient. How do you want to manage keys between yourself and the recipient? What is the size of the secret? How much do you trust the channels over which you are sending the message? Do you need to validate the identity of the secret's recipient? How many secrets do you need to send to how many recipients each minute?

All of these influence how the secret should be bundled up and sent, and it takes more than a library to pick the appropriate method.

OK, I admit, if you like you can make it much harder than it needs to be.

The average person doesn't have the time needed to learn how to build a house, either. As a society, we solve this problem by delegating the construction industry to a group of people who specialize in it, and whom we can trust to do things properly.

There's no way around the fact that security is complex. If it's too time consuming for the average developer, then they shouldn't be doing it at all.

I think for ordinary programmers, stories of failures might be more effective. Case studies, perhaps even a bit embellished to make them more compelling. Something along the lines of Kurzweil's cuckoo's egg. Except shorter and more recent.

I think what you're getting at is a specific technology won't save you, it's the security mindset that's needed. Convincing programmers there really are bad people who want to pick apart your systems is the problem. Once their convinced, once they take security seriously, they'll do better.

They might start out doing a terrible job, but with the security mindset, they'll improve. They'll seek out problems and solve them. Rather than pretending it's not an issue, or blindly apply security secret sauce like "use bcrypt"

What is Kurzweil's cuckoo's egg

Do you perhaps mean Cliff Stoll?

Yes! My pre coffee memory failed me. Thanks.

Thanks for this. Time to go fix my code[1].

[1]: https://github.com/andrewrk/genesis/blob/0d545d692110d33068d...

Since you're only targeting Linux, if you are fine with requiring recent kernels (3.17), you should probably use getrandom(2): http://man7.org/linux/man-pages/man2/getrandom.2.html

I prefer to trust the NSA on these matters. They end up saying much of what the author has written, but they make it clear why you want to use one vs the other.

The excerpt below is from https://www.nsa.gov/ia/_files/factsheets/I43V_Slick_Sheets/S... (which in turn also references https://www.nsa.gov/ia/_files/factsheets/I43V_Slick_Sheets/S... )

Unix-like Platforms (e.g. Linux, Android, and Mac OS X):

Application developers should use the fread function to read random bytes from /dev/random for cryptographic RNG services. Because /dev/random is a blocking device, /dev/random may cause unacceptable delays, in which case application developers may prefer to implement a DRBG using /dev/random as a conditioned seed.

Application developers should use the “Random Number Generators: Introduction for Operating System Developers” guidance in developing this solution. If /dev/random still produces unacceptable delays, developers should use /dev/urandom which is a non-blocking device, but only with a number of additional assurances:

- The entropy pool used by /dev/urandom must be saved between reboots. - The Linux operating system must have estimated that the entropy pool contained the appropriate security strength entropy at some point before calling /dev/urandom. The current pool estimate can be read from /proc/sys/kernel/random/entropy_avail.

At most 2^80 bytes may be read from /dev/urandom before the developer must ensure that new entropy was added to the pool.

Maybe I'm just tired & not able to detect sarcasm right now, but isn't 2^80 bytes more bytes than are currently stored in the world? That's on the order of 10^24, which is something like 1 million exabytes, which is a million squared terabytes, right?

The amount of digitally stored information in the world as of May 2009 [1] is in the order of 2^71. There's no way a single process in a Linux distro will live long enough to read 2^80 bytes for the foreseeable future.

[1] http://www.theguardian.com/business/2009/may/18/digital-cont...

No, it's called being safe

And I wouldn't put it past them to have an attack (at least theorectical) that exploits this

There's safe and there's FUD. Nobody will read 2^80 bytes from urandom. You'll literally run out of time before that. It would take around 1,782,051,134 years to do on my system.

So if they write that there's a vulnerability after reading 2^80 bytes - that's great! We're secure. If they write that you must ensure to do something after 2^80 bytes - that's complete bullshit.

Yes, reading 2^80 bytes for a practical attac is impossible today (and also for the near future)

However, remember when attacks to 3DES, MD5 were only theoretical?

Also, you may not even need to read 2^80 bytes, there might be a (future) vulnerability that allows you to shortcut this.

There are physical limits to our reality and it's very unlikely those limits can be broken (eg, speed of light). Given those limits, 2^80 is large enough that the limit cannot be surpassed without fully breaking reality (eg, timetravel).

If you can break reality, all bets are off though and trying to defend against attacks that break reality in the future are impossible.

The difference is that weaknesses were found in 3DES and MD5. Increasing computing power was not the main factor. "Only" being able to produce 2^80 random bytes is a known and expected limitation. Sure, the CSPRNG could in theory be found to have a weakness, but that has nothing to do with the 2^80 bytes and the same could be said for virtually any cryptographic algorithm.

What weaknesses in 3DES are you thinking about that yield practical attacks?

I am not arguing that there are; that was the parent comment. However, while the 3DES weaknesses don't yield practical attacks now, they still reduce the effective key length. My point was not that 3DES is different in that it is exploitable, but that it is different from the 2^80 limit in that the CSPRNG in that the later is not a result of a mistake in the algorithm's design but instead an expected feature. Just like the fact that any fixed-size key symmetric cipher is "limited" by that key size.

Now, if someone found a lower limit based on exploiting some weakness in the random number generation, the analogy with 3DES and MD5 would make more sense.

Which practical attacks on 3DES are you thinking of?

Just brute force basically

But there seems to be smarter attacks (ref 21) https://en.wikipedia.org/wiki/Triple_DES#Security

Those are not practical attacks. But your argument hinges on their being non-theoretical attacks on 3DES. Are there others you were thinking of?

3DES might have been a bad example.

My point is that even if today some sizes and lengths seem only of theoretical concern, tomorrow there might be a vulnerability, a new approach to the problem, or even natural technological evolution that might turn it into a practical attack (even when it seems impossible today)

But that's true of every cryptographic primitive. You can't make reasoned decisions based on that logic.

From what I recall, the consensus is that the smarter attack is less efficient than brute force overall.

> in which case application developers may prefer to implement a DRBG using /dev/random as a conditioned seed.

Wait, what? dev/urandom is a DRBG seeded the same way dev/random is. This doesn't seem to make sense.


You're very unlikely to be continually reseeding your own DBRG with new entropy, so it will be less secure than /dev/urandom, which is.

Suspiciously bad advice, there, from what I can see.

Just as a random counter-example, Openssl does use that method. But it actually seeds from urandom, rather than random... And fails when forking/threading by default. :(

But yes, that's not common. (more info: http://wiki.openssl.org/index.php/Random_Numbers)

Is the joke that this is awful advice?

(Except for saving the entropy pool between reboots, which can be useful, afaik.)

> application developers may prefer to implement a DRBG using /dev/random as a conditioned seed

No. Doing random in userland is just wrong. If your program has access to /dev/urandom, use it. If not, use arc4random().

> “Random Number Generators: Introduction for Operating System Developers”

Or look how OpenBSD does it. (getentropy(), arc4random(), the subsystem)

> The entropy pool used by /dev/urandom must be saved between reboots.

OpenBSD does this, and more. The bootloader basically seeds the kernel with old entropy from before the reboot.

> - The entropy pool used by /dev/urandom must be saved between reboots.

Every single Linux distro does this.

Why doesn't Linux follow FreeBSD and provide a single interface (under two names, maybe, for compatibility)? Is there any reason to use /dev/random?

Ego. They've put a lot of work into the entropy-tracking thing and they don't want to admit that it wasn't needed.

(Ego is the reason for a bunch of other security misfeatures in Linux. Securelevel comes to mind, where Linus explicitly said after fifteen years, okay, this was in fact the right model all along, you can merge it, just call it something other than securelevel so I don't have to eat my words about securelevel being a mistake.)

Is there a patch to make /dev/random behave like /dev/urandom? I've had at least some server software stall blocking on /dev/random, and I already build my own kernels, so incorporating such a patch is easy.

  rm /dev/random && mknod -m 444 /dev/random c 1 9

this wont survive reboot, you need udev rules.

Or run the app in its own namespace with custom /dev.

Although in most cases it doesn't matter, I wouldn't do this. Remember that the PRNG will only output unpredictable values as long as the initial seed was unpredictable (and unknown). As stated in the article, there is actually one time where you care if you have enough entropy -- the first time you grab a random number. After that the PRNG is seeded with an unknown value and you are fine (assuming you waited until you had enough entropy). Before that, you aren't.

BSD's implementation seems sensible -- block the first time and then don't block again. Or potentially block if you haven't reseeded in X amount of time (X being configurable), to defeat the situation where someone figures out your seed. Although, I have a tough time understanding how someone will figure out your seed without completely compromising your system anyway... but systems are supposed to be robust against attacks that one can't immediately imagine ;-)

Is there a linux kernel patch to use BSD's implementation then?

I use a solution based on udev rules, here: https://superuser.com/questions/309840/how-can-i-point-dev-r...

you could probly edit udev rules to make random be the same device major/minor as urandom

An easy hack is to use rngd with /dev/urandom as an "entropy source" (risky if you are going to use /dev/random for anything that REQUIRES true entropy, but otherwise should be safe as the article suggests).

mount --bind /dev/urandom /dev/random

That won't stick across reboots, and at that point, I think you can just rm /dev/random; ln -s /dev/urandom /dev/random.

But couldn't you put that in /etc/fstab?

You could. I also find myself binding /dev/urandom into /dev/random in a container FWIW.

I could do a udev rule to just mknod random with the urandom device instead.

maybe it is doable with overlayfs?

Is there anyone that, with full access to the machine/kernel, has managed to predict the output of random/urandom?

I know that a PRNG is predictable if you know all the input variables, and the code for it is publicly available, but has anybody in practice been able to exploit that?

EDIT: that's an honest question. I'd like to read a paper about that.

Interesting and edifying, thank you. I wonder how the recent 4.2 kernel release affects this: Linux 4.2 Released Improving Cryptography Options[1]

[1]: http://www.linuxplanet.com/news/linux-4.2-released-improving...

Very interesting article. Clears up the preconceptions I had about using /dev/random over /dev/urandom ("it's more secure!") and explains why in very straightforward language.

Java is known to read /dev/random when dealing with SecureRandom class. In particular it might cause extremely slow starting time for Tomcat on fresh virtual machine. There's parameter "-Djava.security.egd=file:/dev/./urandom" and I always felt unsafe using it. Thanks to this article, now I won't regret it even theoretically.

Fun thing is, if you pass "/dev/urandom" to this parameter, Java will read /dev/random anyway. May be that was a wise decision 20 years ago.

This is my understanding: /dev/urandom is just a blind producer returning any number in some pool; /dev/random is a producer as well as an inspector, it returns the same number from the pool but it also has an extra eye on the source of that pool and refuse to work if it believes the source is of low quality -- here, not enough entropy.

The problem is that /dev/random is too paranoid. It refuses to work if there isn't enough entropy in the pool, yes. But when it produces numbers, it subtracts that amount from the supposed amount of entropy in the pool. Thus it will refuse to work again pretty soon in a lot of circumstances where it's still completely safe to proceed.

For a concrete example, you start out with zero bits in the pool. /dev/urandom will produce a predictable stream of bits. This is extremely bad. /dev/random will block. this is good.

Now you add 1024 bits to the pool. Both /dev/random and /dev/urandom will produce good numbers.

Now let's say you read 1024 bits from /dev/random. This will reduce the entropy counter back to zero. If you then try to read another 1024 bits from /dev/random, it will block.

But this is nonsense! Those 1024 bits you added before aren't depleted just because you pulled 1024 from /dev/random! It is perfectly safe to proceed generating more numbers at this point (or at least it's as safe as it was before), but /dev/random refuses to.

Many systems don't add entropy to the pool very quickly so it's entirely possible to "deplete" it in this way. Then your code using /dev/random wedges and you have a problem. However, few systems are ever in a state where they have zero entropy. Thus the advice to use /dev/urandom.

Ideally, you'd want a device which blocks if and only if the entropy pool hasn't been properly seeded yet, but which never blocks again after that point. Apparently this is what the BSDs do, but Linux doesn't have one.

> Now you add 1024 bits to the pool.

A further complication - you don't actually add 1024 bits to the pool. Instead, you add N bits to the pool (N > 1024), estimate its entropy as 1024, and increase the overall entropy estimate by that much.

Estimates are always pessimistic, because it's better to be cautious and under-estimate the entropy than the alternative. So you might have 2048 bits or more of real entropy, but that can still be "depleted" by a request for 1024 bits.

The paper that describes Fortuna [0] goes into greater detail on this. Yarrow, the CSPRNG that preceded Fortuna, attempted entropy estimation. Fortuna rejects entropy estimation, and tries to build a CSPRNG that is secure without needing estimates:

"...making any kind of estimate of the amount of entropy is extremely difficult, if not impossible. It depends heavily on how much the attacker knows or can know, but that information is not available to the developers during the design phase. This is Yarrow’s main problem. It tries to measure the entropy of a source using an entropy estimator, and such an estimator is impossible to get right for all situations."

"Fortuna solves the problem of having to define entropy estimators by getting rid of them."

0: https://www.schneier.com/fortuna.pdf

> Fortuna rejects entropy estimation, and tries to build a CSPRNG that is secure without needing estimates

What really annoys me is that a Fortuna-based CSPRNG was contributed to Linux eleven years ago this month, but died on the vine, IIRC because the Linux kernel team were so fond of the entropy-estimation approach.

> Ideally, you'd want a device which blocks if and only if the entropy pool hasn't been properly seeded yet, but which never blocks again after that point. Apparently this is what the BSDs do, but Linux doesn't have one.

The getrandom() call added in Linux 3.17 does this. By default it reads from urandom, but it blocks until it estimates that urandom has been seeded with "a sufficient number of bits", which seems to be around 128 bits of randomness.


When available getrandom() should probably be preferred over both /dev/random and /dev/urandom.

"But this is nonsense! Those 1024 bits you added before aren't depleted just because you pulled 1024 from /dev/random!"

An observer of the produced random numbers can potentially deduce the next numbers from the first 1024 random numbers. This is the reason why /dev/random requires more randomness added -- to prevent the guessing of the next number.

No, they can't. That's not how crypto DRBGs work. If you could do that, you'd have demonstrated a flaw in the entire /dev/random apparatus, not a reason not to use urandom.

Think of a CSPRNG almost exactly the way you would a stream cipher --- that's more or less all a CSPRNG is. Imagine you'd intercepted the ciphertext of a stream cipher and that you knew the first 1024 plaintext bytes, because of a file header or because it contained a message you sent, or something like that. Could you XOR out the known plaintext, recover the first 1024 bytes of keystream, and use it to predict the next 1024 bytes of keystream? If so, you'd have demolished the whole stream cipher. Proceed immediately to your nearest crypto conference; you'll be famous.

Modern CSPRNGs, and Linux's, work on the same principle. They use the same mechanisms as a stream cipher (you can even turn a DRBG into a stream cipher). The only real difference is that you select keys for a stream cipher, and you use a feed of entropy as the key/rekey for a CSPRNG.

It's facts like this that make the Linux man page so maddening, with its weird reference to attacks "not in the unclassified literature".

In theory yes. In practice you're talking about a massive cryptographic break, to the extent that it's not worth worrying about if you're going to be using those random numbers for anything that involves real-world cryptography. If you can't trust your CSPRNG to be secure when your attacker gets ahold of 1024 bits of output then you can't trust anything you're going to do with the output of your random number generator anyway.

> An observer of the produced random numbers can potentially deduce the next numbers from the first 1024 random numbers.

By definition, a cryptographically secure pseudorandom number generator cannot be predicted like that by a computationally bounded attacker.

Thus if any attacker could deduce the next number from /dev/random by observing the numbers before, the algorithms they adopt is fundamentally wrong, and nothing could the save the security in that case.

http://www.2uo.de/myths-about-urandom/#computationally-secur... together with http://www.2uo.de/myths-about-urandom/#low-entropy refutes that point.

(That's why I've chosen this structure for the essay, and have included anchors)

It should be roughly as hard as cracking a hash of 1024 bits, i.e. very hard.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact