Hacker News new | comments | show | ask | jobs | submit login
LibreSSL's PRNG is Unsafe on Linux (agwa.name)
179 points by agwa 1103 days ago | hide | past | web | 146 comments | favorite



There's also pthread_atfork. Use that to reset the PRNG. It's a bad interface, but it'll work for this purpose. It bothers me when people very solemnly and seriously condemn systems for problems that are, in fact, amenable to reasonable solutions.


> There's also pthread_atfork.

That requires linking with libpthread, which a single-threaded program would not normally do. Otherwise, it's not a bad suggestion.

Still, on top of everything LibreSSL does to automatically detect forks, it should still expose a way to explicitly reseed the PRNG in an OpenSSL-compatible way, since OpenSSL has made guarantees that certain functions will re-seed the PRNG, and there may be some scenarios where even the best automatic fork detection fails (imagine a program calling the clone syscall directly for whatever reason, in which case pthread_atfork handlers won't be called). Since LibreSSL is billed as a drop-in replacement for OpenSSL, you should not be able to write a valid program that's safe under OpenSSL's guarantees but not when linked with LibreSSL.


The libc on my system, Ubuntu 14.04, exports __register_atfork, which is documented here:

> http://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB...

pthread_atfork itself really should be moved into libc, however. (And POSIX should stop treating it as a redheaded stepchild: it's useful!)


A threading interface was standardized into C 2011, so pthreads should eventually stop being necessary. In a sense, not just POSIX, but C proper has adopted a threading interface.


The C threads interface is missing important functionality --- priority inheritance and static mutex initializers come to mind. C11 threads.h is a lowest-common-denominator portability shim, not a general-purpose pthreads replacement.


threads.h are the lowest common denominator. If you want system specific functionality you can always wrap those functions.

Static mutex init can be emulated with call_once() function, with some limitations of its own of course.


  That requires linking with libpthread, which a single
  threaded program would not normally do. Otherwise, it's
  not a bad suggestion.
Note that doesn't hold for Solaris, all of the thread functionality is in libc; libpthread only exists for historical compatibility at this point.


http://yarchive.net/comp/linux/getpid_caching.html

Reading that thread it seems like direct calls to clone(2) can bypass at least glibc's pid cache (which would likely also break LibreSSL's approach).

Any idea if direct calls to clone(2) also bypass pthread_atfork?


> The key is to acquire resources in places that can fail and use them in places that can't. I'm amazed and disappointed that the LibreSSL people aren't following this basic principle.

Wow, great find! LibreSSL could avoid this by calling the getpid() syscall directly.

> Any idea if direct calls to clone(2) also bypass pthread_atfork?

They do, since atfork handlers are invoked by the userspace wrapper for fork().


> They do, since atfork handlers are invoked by the userspace wrapper for fork().

I don't think user libraries should try to deal with users subverting the facilities on which they rely. There are defined interfaces to system functionality. Break or bypass these interfaces, and you're on your own. If you subvert the usual API semantics by calling clone(2) directly or bypassing the fork(2) wrappers, you should be cognizant of the implications.


It would not be unreasonable for a runtime or VM (like a JVM for example) to use a native library for TLS (performance reasons). It would also not be unreasonable for a VM to use clone() directly, maybe it's part of how it implements its own threading or co-routines for example.

Combine those two reasonable patterns with LibreSSL, and suddenly you have a vulnerability. This is even more likely when you take into consideration that LibreSSL is intended as a direct replacement for OpenSSL; callers are even less likely to examine the fine print of the documentation for undefined and unsupported behaviour.

Still, the LibreSSL work is commendable and should be appreciated. The real problem is a lack of good regression tests - and there may be a messy future of niggly issues because of that. I've already had to deal with some.

I'll give another tricky example. One of the earliest pieces of functionality LibreSSL ripped out was an in-library DNS cache. It was poorly documented and the assumption was that it was there as a crutch for shoddy OS-level DNS caching. But I think this cache also played another role; it helped certificate validation workflows function. Sometimes endpoints bind different certificates to different IP addresses for the same DNS-name that uses DNS-level load balancing. If you don't make the name resolve to the same IP address consistently, then what can happen is that the first connect() gets certificate "A" and some user-facing UI or validation process authenticates it, but then then another connect() gets certificate "B" and the caller logic gets confused.

Of course we could blame the caller; or the folks mixing certificates for the same name, but it doesn't really help; users still experience these problems. Just one example of why it is very hard to remove code in fully backwards compatible ways, even if the change seems very innocuous.


There is a test for this in the test suite so you get a warning.


I would agree, were LibreSSL designing its API from scratch. The problem is that OpenSSL's API provides a way to explicitly reseed the PRNG, so a programmer doing something nutty like bypassing the fork() wrapper has a way to make sure the PRNG is still safe to use. If LibreSSL wants to be a drop-in replacement for OpenSSL, its API needs to provide the same functionality.

Edit: to be clear, none of this is an argument against using pthread_atfork - I just want LibreSSL to provide an explicit way to reseed the PRNG like OpenSSL does.



Why would you want to reseed the PRNG?

If the PRNG is good enough (no visible correlation in the statistics tests you can imagine thrown at it), and it's properly seeded with true randomness, then isn't everything peachy?

I am much more afraid of the seeding part of it than the actual algorithm. The algorithms are well studied by smart people, the actual implementation and seeding aren't always.

There mere fact that one could reseed the PRNG makes me nervous. That could be used in devious ways. But I am not a cryptographer, not even a mathematician, so don't take my word for it!

Am I wrong here? Why?


When your process forks, you end up with two processes with identical state. One or the other will need to reseed or the two processes are going to generate the exact same random stream.


I read the article again and now I think I understand: libressl has its own PRNG which is seeded separately from the system's. Now it's that descision I don't understand but I seems a lot of other people don't either. Thanks!


As a developer who's integrated with openssl several times (generally on Linux), I couldn't be more pleased with the results coming out of the libreSSL effort.

Even if we end up with a list of linux-specific gotchas (and I don't think we will), it is more a case of ten steps forward, one step back.


Can we, please, have a syscall in Linux that returns random bytes from the system CSPRNG or blocks if not seeded yet and doesn't involve dealing with file descriptors?

But even while one isn't available, why is LibreSSL trying to use a userland CSPRNG instead of always reading from /dev/urandom and aborting when that fails?


Yes. Actually, the relevant IETF list is now calling for that: Linux needs getentropy(2). I may cook up my own and submit it to LKML, or perhaps someone else can, but there's no way out of this one without kernel support.

I don't know why the rest of the function even exists. It's the kind of cruft libReSSL is trying to get rid of.

I am not entirely sure a PRNG should even exist in the library, and personally, I'd pass it onto /dev/urandom or /dev/random or the relevant syscall.

I agree with making it (154-156) a hard kill for a TLS library not to be able to get entropy.

And, this is great! This is exactly the kind of thing we're able to find now that some of the code isn't a hedge-maze.


The comments in the code makes it exceedingly clear that a part of the reason for this code to exist is to make a point, and it seems they've succeeded very well at that.


Yeah, if LibreSSL continues to implement its own PRNG in userspace, a getentropy syscall only solves the chroot issue, not the fork issue. You're probably right that a PRNG should not exist in the library.


Please do submit it. Bonus points if you can make it work like BSD's random/urandom, i.e. block until the entropy estimator says "OK, it's secure to use now", but never block thereafter.

EDIT to add: Frankly, it's a disgrace that Linux doesn't already do this, instead choosing to push the burden of getting all the details right to userspace where you can be vulnerable to all sorts of interesting timing attacks, FD-based DoS, etc.


Just adding a link to the IETF discussion thread here (and yes, that's really what the randomness list is called!): https://www.ietf.org/mail-archive/web/dsfjdssdfsd/current/ms...

Theodore T'so isn't completely opposed to the idea, it seems.


> there's no way out of this one without kernel support

This problem is due to LibreSSL's internal architecture being too closely matched to OpenBSD, not a fundamental problem with Linux.


What are the advantages of a syscall in place of /dev/urandom?


Being able to call it from inside a chroot, or within a syscall-based sandbox (like SECCOMP_FILTER).


Couldn't you just call mknod() to recreate the node in a chroot?


Firstly: getentropy(2) is a /dev/random substitute, not urandom. You should call the process-context CSPRNG arc4random(3) in lieu of /dev/urandom. That should be in the standard library (and hopefully will be added). (It uses ChaCha20 nowadays, not RC4.)

Secondly: It can fail with EINVAL (bad pointer) or EIO (>256 bytes requested), but does not fail even in a condition where file descriptors are exhausted. I don't know of its behaviour if it is called too early in the boot process to have been safely seeded, but I hope it either errors loudly or blocks.


> Firstly: getentropy(2) is a /dev/random substitute, not urandom.

Mind though that under the BSDs, there's no functional difference between /dev/random and /dev/urandom.


Less syscalls.


No. read(2) is one syscall, and the cost of the initial open is negligible. One real motivation seems to be wanting programs to work without a /dev. I don't think that's a reasonable requirement.


Access to /dev is DOS'able in many situations by exhausting file descriptor / open files limits.


> Access to /dev is DOS'able in many situations by exhausting file descriptor / open files limits.

That's why you open /dev/urandom in advance of performing operations that require randomness. If that open fails, you don't go on to perform the operation that requires randomness.


Even if you have opened it, you have no guarantee that the file descriptor has not been closed since. Yes, that would be stupid of the user of the library, but many security lapses happens because people makes stupid assumptions. Code to close all file descriptors on fork for example is fairly common, so you can not safely assume that the file descriptor remains valid.


> Even if you have opened it, you have no guarantee that the file descriptor has not been closed since.

You can absolutely rely on internal file descriptors not being closed. A program that closes file descriptors it does not own is as buggy as a program that calls free on regions of memory it does not own. A library cannot possibly be robust against this form of sabotage. The correct response to EBADF on a read of an internal file descriptor is to call abort.

The "close all file descriptors" operation is most common before exec. After exec, the process is a new program that can open /dev/urandom on its own (since, as I've mentioned previously, it's a broken environment in which /dev/urandom does not exist).


> You can absolutely rely on internal file descriptors not being closed.

I've explained several times why you can't. The program that closes all file descriptors may be broken, but the big problem is that as long as the library has no safe way of reporting this to the caller without breaking the OpenSSL API, they are faced with either breaking a ton of applications or finding an alternative. And they've explained why this is not an alternative (in the copious comments in the soure):

> The correct response to EBADF on a read of an internal file descriptor is to call abort.

They have no control over whether or not this will result in an insecurely written core file that can leak data, and this is a common problem. If the person building the library knows that the environment it will be used in does not have that problem, it's one define to disable the homegrown entropy.

> The "close all file descriptors" operation is most common before exec.

I've seen it in plenty of code that did not go on to exec, to e.g. drop privileges for portion of the code.


> I've explained several times why you can't

OpenSSL is crufty in part because it's full of workarounds for ancient, crufty code. LibreSSL shouldn't repeat that mistake. LibreSSL does have ways to report allocation failure errors to callers. It shouldn't even try to work around problems arising from applications corrupting the state of components that happen to share the same process. That task is hopeless and leads to code paths that are very difficult to analyze and test. You're more likely to create an exploitable bug by trying to cope with corruption than actually solve a problem --- and closing file descriptors other components own is definitely a form of corruption.

> [LibreSSL has] no control over whether or not [abort] will result in an insecurely written core file

The security of core files simply isn't LibreSSL's business. The mere presence of LibreSSL in a process does not indicate that a process contains sensitive information. LibreSSL has no right to replace system logic for abort diagnostics. If the developers believe that abort() shouldn't write core files for all programs or some programs or some programs in certain states, they should implement that behavior on their own systems. They shouldn't try to make that decision for other systems. LibreSSL's behavior here is not only harmful, but insufficient, as the library can't do anything about other calls to abort, or actual crashes, in the same process.

> I've seen it in plenty of code that did not go on to exec, to e.g. drop privileges for portion of the code.

Please name a program that acts this way.


> No.

open+read+close + all the mess associated with exhausting file descriptors > getentropy

fairly obvious, isn't it?


Of course getentropy would be better. But the current mechanism is not wrong or broken: at best, it's inconvenient. And it's certainly no excuse for the LibreSSL authors to write a library that calls raise(SIGKILL) on file descriptor exhaustion. That behavior, in many cases, amounts to a remote DoS. As long as this code is in the library (even if off by default), I'm hesitant to recommend LibreSSL.


Without a way to getentropy(2) [hint] that doesn't use file descriptors, it has no other secure choice but to raise(SIGKILL) in my opinion; a mere error might be overlooked, but continuing to run could expose secrets and keys, which is much worse than a DoS condition (anything in file-descriptor exhaustion when under attack is already being DoSsed). (It's turned off because coredumps could also do that locally.)


It's behind a define so there's no problem, don't turn it on if you don't like it. You'd never execute that code anyway, because you're the smart admin who knows better and always has the right devices in all the chroots. And who cares about other users! If they don't know better, they're doing it wrong. Put the blame on them. There is no problem.


Why must it have its own PRNG? Is there a problem asking the kernel (via /dev/urandom) for all required entropy, at the time it is needed? Or would this cause a real-world performance problem?

Surely this is an obvious first question that all commentators are stepping over?


It mentions chroot jails in which you can't access /dev/urandom.


Sure you can access /dev/urandom: just set up the device node in advance of the chroot. For better or worse, /dev, /proc, and so on are all part of the Linux system API and provide functionality not found elsewhere. Why would you expect programs to work when deprived of part of the API?


Why setup the device when you should be able to open it way prior to chrooting (if you don't - that's a proper time to abort() on my book) and keep the descriptor open for later use?


What if the random numbers are needed not by the daemon that chroots itself, but by a separate program that gets exec()'d within the chroot?


A good point. Didn't thought about this case.


How does a program set up the device node in the chroot if it's jailed in the chroot? Is that possible? Would any non-root program be able to create device nodes?


You have to be root (well, have CAP_SYS_CHROOT) to call chroot in the first place. If you're root, you can call mknod to create your device nodes.


It is not a given that the process that runs LibreSSL at some point after the chroot happens will have CAP_SYS_CHROOT/root.

Also, it doesn't solve the issue, as /dev/urandom will also be unaccessible temporarily if you run out of file handles.


[deleted]


I agree, partly. The problem is that LibreSSL needs to decide what to do. The comments in the file outline why they believe they have few options: They are concerned about aborting, because of potentially unsafe core files (consider if someone lets you execute a suid process in a chroot that has access to read SSL keys you should not have access to, and you can cause it to dump a core file that you can read simply by DOS'ing access to /dev/urandom), and lack of other safe ways of reporting failure.

Couple that with the fact that whether or not the /dev/ is set up correctly is a distraction: You have no guarantee that you will be able to open and read any file, as they can not guarantee that there's nothing available on the server that can't easily be used to hit file descriptor limits for a suitable process or open file limits for the entire system.

So this problem is there regardless of whether or not you're willing to demand a correctly configured /dev.


There are two issues here. One is the ability to obtain, given sufficient local resources, a file descriptor to /dev/urandom. I think defining away this problem is fine: running a program in an environment without a valid /dev is simply user error. In such an environment, operations that might require entropy should simply fail.

The second issue is resource limits: low level components of LibreSSL cannot cope with entropy-generating functions failing. On OpenBSD, these functions cannot fail, but on Linux, they can. That's not a problem with Linux, but with LibreSSL's architecture. It's LibreSSL's responsibility to ensure that it allocates. Every call to LibreSSL's internal RNG is preceded by some kind of resource-allocating call that can fail. It's in this call that LibreSSL should obtain the resources needed to do its work. That it doesn't is simply a bug in LibreSSL, not a deficiency in Linux.


It becomes the application's problem. LibreSSL on its own can't really go out and obtain resources whenever it wants, nor can it ensure the application won't take these away. The application using LibreSSL would have to make LibreSSL do these things, at the right time and right place. You're pushing responsibility to users and applications and while at it admitting that it is impossible for a library to abstract it all away and provide a guaranteed safe API that cannot ever fail. It definitely sounds very Linuxy.


LibreSSL can pre-allocate resources it needs in places where applications expect it to fail, then go on to use those resources in places where applications do not expect it to fail. Complaining about having to manage a file descriptor is strange, since LibreSSL already manages another resource, heap memory. Nobody expects LibreSSL to be able to do everything it promises if the system runs out of memory, so why should people expect LibreSSL to do everything it promises if the systems is misconfigured or out of file descriptors? Fundamentally, applications need to cope with resource constraints, and it's the job of libraries to tell applications when resource constraints have been exceeded. raise(SIGKILL) is an exceptionally poor way of informing an application that its resource demand has outstripped supply.


> Nobody expects LibreSSL to be able to do everything it promises if the system runs out of memory, so why should people expect LibreSSL to do everything it promises if the systems is misconfigured or out of file descriptors?

The issue is not that anyone expects it to do everything it promises if the system is misconfigured, but that if it should fail, it should take care to try to avoid failing in ways that could open massive security holes.

This is the issue here: The developers believe that as the existence of systems with unsafe core files is well established, their options are limited, as there is a risk of exposing enough state to less privileged users with a core dump to leave the system vulnerable. Someone building for a system they know has properly secured core files, can disable the homegrown entropy code, and the code will fail hard if /dev/urandom and sysctl() are both unavailable, and the problem goes away.

But what do you suggest they do for the case where they do not know whether failing will expose sensitive data? They've chosen the option they see as the lesser of two evils: Do as best they can - only as a fallback, mind you - and include a large comment documenting the issues.

If they had full freedom to design their own API this would not be an issue. They could e.g. have put in place a callback that should return entropy or fail in an application defined safe way, or many other options. But as long as part of the point is to be able to support the OpenSSL API, their hands are fairly tied.


If LibreSSL was the origin of the API, I might agree with you. But LibreSSL is trying to largely conform to an API they have inherited. If that isn't feasible to do safely on Linux, then while it may be arguable whether or not it is a deficiency in Linux, it is a problem for Linux users.

For my part, I believe strongly that it is a deficiency if we have to go through all these kinds of hoops in order to safely obtain entropy, when the solution is so simple on the kernel end: Ensure we retain a syscall.


Exactly what part of LibreSSL's API prohibits the maintenance of an internal file descriptor and early checks that this file descriptor can be filled with a handle to /dev/urandom?

Resource management is not a "hoop". It is a fact of life.


Nothing. But nothing also prevents the client from intentionally or accidentally closing that file descriptor. As I've pointed out elsewhere, looping over all file descriptors and closing them on fork() is a common pattern for applications where you want to ensure you don't have resource leaks (whether for capacity or security).

Which is fine if you have a safe way of returning errors in all code paths, but as the comments points out, they believe they don't. Maybe they're wrong, but they seem to have spent some time thinking about it. They've also provided an easy define to change the behaviour to failing hard for people building it on systems where their caveats against failing hard does not apply (e.g. systems with secure core files)

If they can't fail early in a safe manner without potentially creating a security leak (as they potentially would if an unprivileged user could induce an unsafe core dump), and can't even be guaranteed that they're able to safely log the error (there's no guarantee they'd be able to write it anywhere), it's hard to see alternatives but to try to do something that is "good enough" as the alternative could be much worse.

Neither is a good solution.


> looping over all file descriptors and closing them on fork() is a common pattern for applications where you want to ensure you don't have resource leaks (whether for capacity or security).

Any application that does that is broken. Please stop trying to bring up this behavior as something a library needs to support. It isn't. If application try to close all file descriptors, then go on to do real work, plenty of things other than LibreSSL will break.

Would you go around calling munmap on random addresses and expect your application to keep working? Would you write a library that tried to guard against this behavior?

> if you have a safe way of returning errors in all code paths, but as the comments points out, they believe they don't.

That's an internal LibreSSL problem. There's nothing stopping the LibreSSL team from implementing the correct plumbing for telling callers errors about errors. AFAICT, there is no sequence of valid OpenSSL API calls such that the library needs entropy, but at no point in this sequence of calls can indicate failure.

The problems you highlight are not things libraries should try to work around. They're systemic issues. Libraries calling raise(SIGKILL) because their authors don't believe systems have sufficiently secured their core file generation is absurd and only makes the problem worse because it makes overall system operation less predictable. (Imagine a poor system administrator trying to figure out why his programs occasionally commit suicide with no log message or core file.)

These are not problems that require system-level fixes. They're problems that require changes from LibreSSL.


> Any application that does that is broken. Please stop trying to bring up this behavior as something a library needs to support. It isn't. If application try to close all file descriptors, then go on to do real work, plenty of things other than LibreSSL will break.

It doesn't matter that it is broken. It matters whether or not it is done and how to deal with it when it happens.

> If application try to close all file descriptors, then go on to do real work, plenty of things other than LibreSSL will break.

Plenty of things that applications that do this will have to have successfully dealt with. Effectively failing hard now will be a change of behaviour that makes LibreSSL incompatible with the OpenSSL API it is trying to implement, and possibly causing security problems in the process.

Either they do this properly, or they need to break compatibility with OpenSSL sufficiently that people don't accidentally start exposing sensitive data because they mistakenly thought LibreSSL was a drop in replacement (which it won't be if it does things like calls abort() in this case).

> Would you go around calling munmap on random addresses and expect your application to keep working? Would you write a library that tried to guard against this behavior?

Strawman. munmap() on random addresses is not something I have seen. Looping over all file descriptors and closing is something I have seen in lots of code. Code that works fine unless someone introduces a breaking change like suddenly holding onto a file descriptor the library previously didn't.

And when the risk is exposing sensitive data to a potential attacker, I would most certainly weigh the risks of failing hard vs. attempting a fallback very carefully.

> There's nothing stopping the LibreSSL team from implementing the correct plumbing for telling callers errors about errors.

There is: It would mean LibreSSL does not work as a drop-in replacement for OpenSSL. That may very well be a decision they have to make sooner or later, and may very well be the right thing to do, but there are big tradeoffs. They've chosen this avenue for now, with a large comment in the source making it clear that this is in part a statement about their belief that the best solution would be for Linux to keep a safe API to obtain entropy.

Note that this is even code that will never be executed when running on a current mainline kernel. It will break on systems where people have been overzealous about disabling sysctl(), or on systems moving to some future mainline kernel which we don't know when will be released.

> The problems you highlight are not things libraries should try to work around. They're systemic issues. Libraries calling raise(SIGKILL) because their authors don't believe systems have sufficiently secured their core file generation is absurd

It's not absurd when we know for a fact that this often happens and is a common source of security breeches.

For systems where this is not an issue, disabling the fallback is a define away for your friendly distro packager.

> and only makes the problem worse because it makes overall system operation less predictable. (Imagine a poor system administrator trying to figure out why his programs occasionally commit suicide with no log message or core file.)

I'd rather be the system administrator trying to figure this out, than the system administrator that doesn't know that various data found in my core files have been leaked to an attacker.

It will also only happen if: /dev/urandom is inaccessible and you're running on a kernel without sysctl() and you've chosen this alternative over the built in fallback entropy source.

> These are not problems that require system-level fixes. They're problems that require changes from LibreSSL.

They're problems that may not be possible to fix for LibreSSL without failing to meet one of it's main near-term goals of being a drop-in replacement for OpenSSL.

This is also likely to not only be a problem for LibreSSL - to me it raises the question of how many applications blindly assumes /dev/urandom will always be available and readable. It is a system-level problem when every application that wants entropy needs to carefully consider how to do it to avoid creating new security holes, when the solution simply is to retain a capability that is currently there (the sysctl() avenue) or implementing getentropy().

We're not likely to agree on this, ever. We're going in circles now, and just reiterating largely the same argument from different angles.

I won't comment any more on this, other than saying that for me, it's a matter of a basic principle: Assume everything will fail, and think about how to be the most secure possible in this scenario. To me, that makes the decisions the LibreSSL developers the seemingly only sane choice in a bad situation assuming the constraint of sticking to the OpenSSL API. Long term I think they ought to clean up the API too, but short term I think we'd get far more benefit out of them making it possible to safely replace OpenSSL first. And that may require sub-optimal choices to deal with the worst case scenarios, but then so be it.


> Plenty of things that applications that do this will have to have successfully dealt with.

If I have a choice of accommodating broken applications that close all file descriptors (and you have still not named one) and having a system in which libraries can retain internal kernel handles, I'll take the latter. LibreSSL already breaks compatibility with OpenSSL in areas like FIPS compliance. Compatibility with broken applications is another "feature" that would be best to remove.

> Strawman. munmap() on random addresses is not something I have seen. Looping over all file descriptors and closing is something I have seen in lots of code. Code that works fine unless someone introduces a breaking change like suddenly holding onto a file descriptor the library previously didn't.

There is no fundamental difference between a library-internal resource that happens to be a memory mapping and one that happens to be a file descriptor. Are you claiming that no libraries in the future should be able to use internal file descriptors because there are a few broken applications out there that like to go on close(2)-sprees?

If that level of compatibility is important to you, do what Microsoft did and implement appcompat shims. Did you know the Windows heap code has modes designed specifically for applications that corrupt the heap?

If you're not prepared to go down that road, please recognize that broken behavior is not guaranteed to work forever. There was a time when use-after-free was a very common pattern: on a single-thread system, why not use a piece of memory after free but before the next malloc? That pattern had to be beaten out of software in order to allow progress to be made. As it was then with memory, now it is again with indiscriminate file descriptor closing.

> There is: [proper error plumbing] would mean LibreSSL does not work as a drop-in replacement for OpenSSL

This claim is simply untrue. The OpenSSL API has a rich error-reporting interface. No compatibility break is required for reporting entropy-gathering failures. The only needed changes are inside LibreSSL, and its developers appear to be refusing to make these changes.

> for me, it's a matter of a basic principle

Another basic principle is that you can't get very far if you assume everything can fail. You have to have a certain axiomatic base. The ability to have private resources in a library is a perfectly reasonable basic assumption.


>applications that close all file descriptors (and you have still not named one)

elinks:

https://github.com/yggi49/elinks/blob/master/src/protocol/fi...

https://github.com/yggi49/elinks/blob/master/src/protocol/co...


That doesn't count. It's closing file descriptors before exec (and in one case, before doing some tightly-scoped work). It has nothing to do with closing all file descriptors, then expecting to use arbitrary third-party libraries.


And example of what happens if you don't close all file descriptors: https://bugs.php.net/bug.php?id=38915


Yes, letting child processes inherit stray file descriptors is dangerous. We're not talking about calling all file descriptors immediately before exec. We're talking about applications that say, "Ok now, I'm going to close all file descriptors and go back to what I was doing". You can't expect libraries to work when you're freed their internal resources from under them.


So most programs can't "just create a device node".


It isn't about performance, but instead /dev/urandom is believed to be a poor source of entropy by the OpenBSD developers.

I believe the heart of the issue it that /dev/urandom will give you a string even if it has very low entropy at the time.

You can find all sorts of articles for and against /dev/urandom and I don't really know enough to comment on it's security, but I trust the that the team working on this fork more than I trust the OpenSSL foundation.


> I believe the heart of the issue it that /dev/urandom will give you a string even if it has very low entropy at the time.

On OpenBSD, /dev/urandom does the right thing, unlike Linux. As per http://www.2uo.de/myths-about-urandom/ -

> FreeBSD does the right thing: they don't have the distinction between /dev/random and /dev/urandom, both are the same device. At startup /dev/random blocks once until enough starting entropy has been gathered. Then it won't block ever again.


The issue is not that they believe /dev/urandom to be bad, but that it flat out isn't guaranteed to be available: If you're chroot'ed, chances are you won't have read access (or see) /dev/urandom. Furthermore, if you've run out of file handles (maybe intentionally - because someone figures they can try to DOS you to attack the PRNG), it is not a given you'll be able to open it even if it's visible.

LibreSSL tries /dev/urandom first, then falls back on a deprecated sysctl() interface, then tries it's own "last resort fallback".


Then abort if it's not available? A lot of software (most?) doesn't work with an empty /dev. At least null is othen required, so why not throw urandom in there as well?


The source explains why the developers does not see aborting as acceptable: It opens a huge security hole on systems where core files are insufficiently secured. On systems that are properly secured, it's a single define to cause it to fail hard when it can't use either /dev/urandom or sysctl().


I see, but I am not convinced. I think the argument still stands. There are always other ways to crash.


You're completely wrong. According to the OpenBSD devs, on modern BSDs and Linux, /dev/urandom is as good a source of entropy as anything. It's commonly implemented by a good cryptographically secure pseudo-random generator. This code only gets called in cases where /dev/urandom is not available (for example in a chroot jail or when the file descriptor limit is reached).


That was not his point. /dev/urandom on Linux does return low entropy strings when it simply doesn't have any at boot time. /dev/(u)random on BSD actually blocks until it has collected enough entropy to get going, and doesn't block thereafter.

So BSD /dev/urandom is more secure in that it never gives bad random numbers for some baseline badness. He was not factually wrong about that, although he is wrong in stating that that was the reason the OpenBSD developers are dismissing it.


You are correct about BSD /dev/urandom vs Linux /dev/urandom. But since LibreSSL isn't likely to be used at boot time, it doesn't try to work around that issue. Instead, the concern is "what should we do if we can't open /dev/urandom?"

For what it's worth, the "exploit" requires (1) the user deny access to both /dev/urandom and the sysctl interface, (2) multiple levels of forks (a child never has the same PID as its parent, but a grandchild can have the same PID as its grandparent, and (3) the grandparent must exit before the grandchild (PIDs uniquely identify all running processes). It's not something that will happen by accident, even to incredibly careless programmers.

But I do agree with the BSD guys that Linux should have another way to get entropy in this case (note that they have a similar file for OS X for similar reasons). And I hope it's not named CryptGenRandomBytes().


The "unlikely"s and "by accident"s in your comment are correct if and only if you assume too much about the scenario in which LibreSSL is to be deployed. It goes far beyond web servers, you know.

For example, what of routers that have no means of entropy input but interrupt timing? What of Android libraries that just use libcrypto? These systems are usually free to exploit by determined attackers!

LibreSSL/OpenSSL doesn't think "unlikely" and tries to cover as much as possible. The TLS library needs to work as good as possible regardless of the context.


>First, LibreSSL should raise an error if it can't get a good source of entropy.

Comments for getentropy_linux.c explain this http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libcrypto/cryp...

We have very few options:

- Even syslog_r is unsafe to call at this low level, so there is no way to alert the user or program. - Cannot call abort() because some systems have unsafe corefiles.


> cannot call abort() because some systems have unsafe corefiles.

This logic seems specious. It's not the job of a library to solve that problem. If a system has crash dump collection configured insecurely, the problem is going to extend well past the SSL library.

> * This can fail if the process is inside a chroot or if file * descriptors are exhausted.

The right solution is to pre-open the file descriptor. SSL_library_init can fail. Do it there.


NSS does something similar since NSS will not be able to access /dev/urandom via file system after the sandbox activates the chroot so they reserve it first and fails with a log warning if no descriptors are available.


Warn? Hell, I'd hard-fail. Libraries need resources to do their jobs. The key is to acquire resources in places that can fail and use them in places that can't. I'm amazed and disappointed that the LibreSSL people aren't following this basic principle.


> The key is to acquire resources in places that can fail and use them in places that can't. I'm amazed and disappointed that the LibreSSL people aren't following this basic principle.

To be fair to the LIbreSSL devs, the Linux-specific /dev/urandom code is currently encapsulated rather nicely behind an interface that's compatible with the OpenBSD getentropy() syscall. Following your suggestion would create a layer violation and move LibreSSL closer toward the (much maligned) OpenSSL approach to cross platform compatibility. I don't think this is a great excuse for the current design, but it's an explanation.


> Following your suggestion would create a layer violation and move LibreSSL closer toward the (much maligned) OpenSSL approach to cross platform compatibility.

The OpenSSL approach to portability is doomed: it can only deal with cosmetic differences between platforms. I appreciate the principle of using compatibility functions instead of #ifdef, but at some point, you need to incorporate the panoply of architectures into your design. It galls me to see the OpenBSD people claim that Linux is broken merely because it is different. That's incredibly arrogance.


Isn't this the same way that they do porting for OpenSSH? Why do you say that method is doomed when it seems to have been working fine for over 10 years?


For OpenSSH, they were using OpenSSL as an abstraction layer for various things, including entropy gathering.


The comments don't justify why going to the sketchy entropy is better than SIGKILLing the process, except with:

> This code path exists to bring light to the issue that Linux does not provide a failsafe API for entropy collection.

Trying to make a point about Linux doesn't seem like a very good reason to me.


The sketchy entropy is only an example, and is a work in progress. Comments read "XXX Should be replaced with a proper entropy measure." and is only called if entropy collection via /dev/urandom and sysctl have failed. If the sysctl method is depreciated it does raise(SIGKILL). They also rearranged getentroy_linux.c so that the main function with the important comments is at the top in hopes whoever is porting reads it.

If you were porting this to a GNU/Linux distro, you can read their list of options and raise (SIGKILL) resulting in silent termination if that's what your platform decided to do if both entropy methods fail, or test for it earlier and fail. Since they are BSD developers they leave it up to whoever is porting to decide.


It seems to be working, given the number of articles I've seen about this issue so far.


> Cannot call abort() because some systems have unsafe corefiles.

Huh, FreeBSD has MAP_NOCORE which allows the program to map pages that will explicitly not be included in the core file. I never realized that this was FreeBSD-specific extension (added in 2007?).

I'm really surprised other platforms haven't adopted it, though I surmise there's a good technical reason or two. (EDIT: or maybe there's similar functionality via another API? I haven't been able to turn up anything).


Linux since 3.4 has MADV_DONTDUMP [1], and there also appears to be a /proc filter file you can use to exclude general segments of memory from being dumped [2].

1. http://man7.org/linux/man-pages/man2/madvise.2.html

2. http://man7.org/linux/man-pages/man5/core.5.html


It's hard to blacklist every piece of memory that might be sensitive. It's a much better idea, IMHO, to just put corefiles in a location accessible only to root. That's how Windows, OS X, Ubuntu, Android, and lots of other commercial systems work.


.. which you need to do anyway. The SSL library, and the program linked against it, can fail in a thousand more ways that generate them.


Well, there is a way to disable core dumps entirely: setrlimit RLIMIT_CORE to 0


You can't simply seed it before a chroot. Look at the code. chacha adds entropy periodically and folds it in. You need entropy in the chroot. The author should probably read 10 lines below the same code he posted in the article. While I'd love to see a solution for this particular contrived example, considering in the much more common use cases it actually is more secure than OpenSSL's. Especially so if your kernel has sysctl in it.


> chacha adds entropy periodically and folds it in. You need entropy in the chroot.

If that's the case then the fix will not be as simple as I envisioned it. Still, the point stands that LibreSSL should allow you to initialize the PRNG once, before you chroot, so that you can use the PRNG safely once inside the chroot. This could be accomplished by keeping a file descriptor to /dev/urandom open.


"If that's the case?" - Didn't you read the code? :) Sounds like you would prefer no stirring of any new entropy after you chroot... Looks to me like they're trying to require that additional entropy be available, always. (and if you don't have a completely hacked up kernel sysctl is still there..) - maybe we might get something better before it (sysctl) goes away for real instead of just in c-library-du-jour.


You're correct :-) - it does periodically stir in new entropy.

I'm fine with stirring in new entropy after chrooting - I just don't want to see sketchy entropy being used, especially for the initial entropy source. If you could make LibreSSL open (and keep open) /dev/urandom before you chroot, LibreSSL could read additional entropy from the already open file descriptor, even after chrooting.

In any case, note that the chroot issue is a bit of a sideshow compared to the much more serious fork issue.


Not sure how a library is going to keep a caller from closing a descriptor - I've certainly seen people attempt to close them all in code before a fork, but that's probably pathological. However that doesn't work across a re-exec, which would also be good practice in many situations (ASLR) - so having to keep a descriptor open to do this would actually discourage secure programming practices because the library would screw you then. What's here will work in that case from the look of it. (assuming sysctl is there, or the voodoo isn't really that bad, I can't tell myself yet... still looking)


Those are good points. Really, Linux needs a proper, non-deprecated, syscall for this.


It's reasonable to require that an exec in a chroot have a minimal /dev. If you execute a program in a broken environment, it breaks. That shouldn't be surprising.


There are many potential cases where LibreSSL won't be initialized until after a chroot, by a program with no special rights running inside the chroot, that is entirely unable to open /dev/urandom or create a device node, so this is not a solution.


What agwa wants from LibreSSL is to behave in every little bit exactly as OpenSSL does, even though OpenSSL itself is a complete and utter mess.

OpenSSL allowed developers to interfere with RNG freely, so LibreSSL must do that, too? [Even if times have changed?](http://permalink.gmane.org/gmane.os.openbsd.cvs/129485)

Well, you can't really go at improving and cleaning up the library if you have to keep up all the old bugs and the whole crusty API around.

It's inconceivable to expect LibreSSL to be both better than OpenSSL, yet to have the exact same API and the exact same set of bugs and nuances as the original OpenSSL.

LibreSSL is meant to be a simple-enough replacement of OpenSSL for most modern software out there (http://ports.su/) — possibly with some minimal patching (http://permalink.gmane.org/gmane.os.openbsd.tech/37599) of some of the outside software — and not a one-to-one drop-in-replacement for random edge cases that depend on random and deprecated OpenSSL craft.


Note that the usual post-fork catch-all security advice (having the child process exec() to wipe process state, thereby making a state leak really hard) solves the fork safety problem by giving the child a whole new PRNG instance, but actually makes it harder to solve the chroot safety problem.

There are various tricks to get a limited number of bytes from /dev/urandom into the chroot jail (such as by writing them to a regular file and secure-erasing that file when finished) to get around that.


How about passing the /dev/urandom file descriptor to the new process? That seems like the most robust solution to me.


That assumes that first process after the chroot knows how to receive and pass on that filedescriptor to the process that will eventually use libressl, which is not a given.


That does work well.


Looks like this issue has been addressed and fixes have been pushed. The fix involves something similar to the pthread_atfork solution a couple of people have suggested.

http://opensslrampage.org/post/91910269738/fix-for-the-libre...


This is just due to ignorance, Linux provides a AT_RANDOM auxv on process creation that could be used to seed the prng.


look at the code. AT_RANDOM is. used when it's avaible in the fallback function. For some reason, the devs don't seem to trust it much, according to the comment.


Is it guaranteed to at least be different for different processes? They could use that in addition to the PID test to know when to reseed.


No. It's only filled on execve, not fork.


Even though it looks like it won't get called, I'm wondering how bad the voodoo is? Anyone looked at what it is spitting into that hash function? How predictable are those clocks as they change between the memory fetches. Will Linux have predictable memory access times where those pages land?


Is LibreSSL written by the same team that wrote this incorrect article? https://news.ycombinator.com/item?id=7700312


How can 2 processes have the same PID, even if it is grandparent and grandchild? When I try killing a process using the PID, how the kernel know which to kill?


Assume a fairly busy system

* Original process with PID 17519

* PID 17519 forks producing a new process with PID 26606

* PID 17519 produces some "random" bytes then exits

* PID 26606 forks producing a new process with the now unused PID 17519

* New PID 17519 produces some "random" bytes, which will be the same as the "random" bytes produced by original PID 17519, causing a raptor to attack the user.


So, PID is used as part of the CSPRNG?

If I get a block from /dev/urandom, then another one at some later time, what are the chances it's identical? Isn't that what you're saying here (or was the whole post intended to be comical and not just the last line).

[If only it had been raining it would have taken days longer for the raptor to attack, or something /random]


The problem is that (Libre|Open)SSL doesn't use /dev/urandom directly and instead implements a CSPRNG in user-space (seeded from /dev/urandom). And then you need to be careful to reseed the CSPRNG after fork or you'll generate the same random number that the other process did.

You could mix in the parent's PID too, but that would only delay the problem (you'd need more layers of fork before triggering the shared-state bug again).

Why can't LibreSSL just open /dev/urandom once, on first call to RAND_poll/RAND_bytes/some-other-init-function, etc. and then always read from it directly. If that first open fails then you return an error from RAND_poll/RAND_bytes.


/dev/urandom is pretty slow on Linux. OpenSSL's CSPRNG is several times faster. On my workstation just now, I get 13MB/sec from /dev/urandom and 61MB/sec from OpenSSL's CSPRNG.

You can't just add the parent pid because that information is lost when the process's parent exits (the ppid becomes 1).


Note that the article calls out two entirely separate issues:

1) The PRNG wrapper apparently depends on the pid to "detect" if there's been a fork, and so the PRNG seed will remain the same in grandparent and grandchild if you double-fork and manage to get the same pid. This may or may not be a problem - whether or not you can induce enough forks to manage to get the right pid will depend on application. This is not dependent on

2) If both /dev/urandom and sysctl() is unavailable, it falls back on generating its own entropy using a convoluted loop and lots of sources. There's all kinds of ways that can be nasty, but that relies on enough factors that just getting the same pid would be insufficient in and of itself (but it may very well reduce the entropy).


They can't have the same PID at the same time.

I think the scenario here is that process X forks, creating process Y, then process X terminates, then process Y repeatedly forks until it creates a process Z with the same PID as X.


Precisely.


> How can 2 processes have the same PID,

PID namespaces?


That's not what I was thinking but it sounds like yet another case where you could fool LibreSSL's fork detection.


Schadenfreude?


Hardly. LibreSSL works just fine on OpenBSD and doesn't have this issue. Portability to Linux is a secondary concern, and this is only an initial stab at the portability layer for Linux.


We need to stop useless forks.


Can't the LibreSSL process just reseed whenever it is started? I guess forks don't actually copy the program counter so they'll have to go through main, right?


It does copy the PC. Actually it just return from the fork call twice, one in the parent and one in the child, with different return values, the pid of the child to the parent and zero to the child.

See http://linux.die.net/man/2/fork for more details


> I guess forks don't actually copy the program counter so they'll have to go through main, right?

This is the way the fork syscall works on all Unices, the fork will start execution right after the fork system call.


Nope, forks actually do copy the program counter.


It is not entirely clear what is the risk of this strange scenario involving a grandchild process and pids wrapping around in an alarmingly quick way.


The risk is that in some situations (it should not matter how often; the environment might be somewhat attacker-controlled) two processes produce identical random numbers. This is bad, because this breaks the assumption that random numbers are independent. A program may reasonably fork into two processes, one which uses random numbers to generate RSA keys and one which outputs random numbers to anyone who wants them. LibreSSL's flaw may allow these two processes to destroy each other's security guarantee.


Ahh, not really - while the process thing the author describes is real - what you're saying is that any two processes show the same values, and that isn't the case. the bad guy needs to control one process to read the values in a useful way, have it exit, and be able to maniuplate the system by killing or creating processes until his intended victim comes up on his selected PID. While that's far from impossible to do (just as the author's program does it) It is likely going to imply enough access to your system by the attacker that you're already pretty much p0wned.


An attacker doesn't necessarily need to know the random values themselves to pull off an attack. For example, if a nonce is re-used, an attacker might be able to decrypt data sniffed from the network. Also, creating processes to force a PID wraparound might be as simple as making repeated requests to a server.


Don't rain on their parade. OpenSSL has been determined to be a laughing stock by super-informed internet forum people, and we need to keep up pretending that a rewrite of a major piece of internet infrastructure is feasible and makes sense.


Given things like the Debian OpenSSL fiasco and Heartbleed, can we honestly put as much faith into open source crypto as it's well-funded proprietary counterparts?

I honestly prefer open source and recognize the problem the author points out as clearly significant problem - as well as the benefits of LibreSSL, but I'm just not convinced there are enough eyeballs looking at open source crypto.


Sorry but that's just an idiotic assertion.

Closed source proprietary crypto, you just don't know who wrote it, who audited it and who backdoored it and who knows of any flaws in it.

Open source crypto, it's there. Go read the source. Anyone can and it's open for audit.

There aren't enough eyeballs I agree but there are infinitely more trustworthy people looking at it than closed source.


"Many eyes make bugs shallow", goes the saying; but it's true that that does require that people actually do look at it - and with the state OpenSSL is in it's clear people took it for granted for years. I'm as guilty of that as you. It was ugly and crufty and I'd assumed and hoped that it'd been thoroughly reviewed and was the way it was because it was being conservative with changes; turns out no, actually it's a giant hairball which they're now shaving, BoringSSL is trimming, and LibReSSL is gleefully taking a combine harvester to!

But reflect on this: we're looking at it now. There are more eyeballs looking at open-source crypto than closed source crypto. Reflect on that for a moment, and on the RSA BSAFE/NSA 'enabling' and the like, and remember that being well-funded didn't stop Apple's source-available implementation from going directly to fail.

I wonder, for example, what's really under the hood of, say, Microsoft SChannel/CAPI/CNG? I'm a reverse-engineer (which means I don't need no stinking source code, given a large enough supply of chocolate) so I may look in detail, when I get a large enough patch of free time. I've heard it's not as bad as it could be… but I know on this subject, for example it ships Dual_EC_DRBG as alternative in CNG (but uses CTR_DRBG by default from Vista SP1 onwards, thank goodness). The old RtlGenRandom wasn't too great, I know that much.


Transparency is a dependency of trust. The "well-funded proprietary counterparts" are non-transparent (per the definition of "proprietary software"), and therefore are untrustworthy.


Ever hear of BSAFE? They took a million dollars from the NSA to implant a backdoor. How do you evaluate code you cannot see?


yes, but I'm talking about RNG from likes of Microsoft or Apple...


The RNG device for Mac OS X's XNU kernel is open source [1]. The kernel for OS X is open source and you can compile your own kernels if you want.

[1]: http://www.opensource.apple.com/source/xnu/xnu-2422.1.72/bsd...


If you can't evaluate it, you can't trust it. Plus, Apple had gotofail, and MS has had its share of issues as well.


can't evaluate > not enough funds to evaluate

In other words, with proprietary sw, at least SOMEBODY evaluated it and placed their seal/name on it. With open source, you are relying on a hope that somebody out there somewhere does it. And in various cases, we've seen how that turned out.


... and in some cases, we've seen how using proprietary software turned out.

You're not making a substantial argument, but if someone has a track record of writing completely safe software that doesn't have to be secured through third party products, like Microsoft, I will be willing to listen.


These are the arguments people have been making for/against open source software for decades. They are old, tired, and well traveled. If we want to rehash them yet again, let's go to the Slashdot archives or Usenet and have a nostalgia party. Otherwise, let's try to focus on something actually interesting.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: