
Show HN: Neverbleed – privilege separation engine for OpenSSL and LibreSSL - kazuho
https://github.com/h2o/neverbleed
======
fefe23
I applaud the effort but it does not address the elephant in the room: openssl
engines are synchronous. If you use openssl in an event-loop style server
(coincidentally, OP's own web server h2o appears to be event-loop based), this
means that the whole server blocks if the RSA operation blocks.

For a hardware accelerator that may not be so bad but if you are trying to
separate the HSM via the network, to minimize what an attacker can do after
compromising the httpd, then every packet loss or an outage or network delay
(or, more generally, any latency) would block the whole httpd.

Cloudflare proclaimed a while ago that they had a way to do essentially this
(with nginx+openssl) and they said their solution was non-blocking, but they
did not publish the code as far as I can tell.

I think if one wanted to solve this problem properly, larger architectural
changes to openssl would be necessary. Please correct me if I'm wrong!

EDIT: Also, if you move out the RSA operation, ideally you'd want to
distribute the work over more than one CPU core. If the operation is
synchronous, you can't really do that.

~~~
kazuho
I assume you are referring to Keyless SSL.
[https://blog.cloudflare.com/keyless-ssl-the-nitty-gritty-
tec...](https://blog.cloudflare.com/keyless-ssl-the-nitty-gritty-technical-
details/)

For Keyless SSL, it is necessary to make RSA operations asynchronous, since
the operations are requested over the TCP network (which may have big delays).

OTOH Neverbleed degelates the operations within the same server using Unix
sockets. So there is no fear of such delays. And the server spawn a dedicated
thread to each client thread. In other words, the delay is practically _no
worse_ than what it is without Neverbleed.

And discussing _how worse_ it is, calculations related to TLS handshakes may
block the server for a few milliseconds. It may sound bad, but generally
speaking it is negligible comparing to the latency over a public network.

~~~
pquerna
The point is that it requires TLS handshakes to be done in a multi-threaded
system for a server handling high concurrency.

Many servers are multi-threaded, but many are not. Using the proposed
technique in a Node.js process, or nginx, is going to severely limit the
number of new connections per second.

~~~
kazuho
Wrong.

You seem to have confusion between TLS handshakes and RSA operations.

In OpenSSL (which is used by many servers including node.js, nginx), RSA
operation is always synchronous. Therefore, using Neverbleed does not impose
new limits regarding concurrency.

~~~
pquerna
Inter-process IPC is going to block the event loop for longer than a inline
RSA operation.

~~~
kazuho
It is true that RSA operation over IPC is slower than doing it internally. But
the latter is by magnitudes faster than the former, therefore the slowdown is
negligible in practice.

You can find the numbers in the FAQ section of the linked website.

------
ibejoeb
Seems like a good idea, but it also seems like a lot of IPC. The author
suggests that crypto costs more than IPC, so it doesn't matter much, which
also seems reasonable, and I suppose most related DOS attacks are mitigated by
disallowing client-initiated renegotiation anyway.

    
    
        > Q. How much is the overhead?
        > Virtually none.
        > On my Linux VM running on Core i7 @ 2.4GHz (MacBook Pro 15" Late 2013)...
    

Would love to see it on a high-end system that's primarily doing termination.

~~~
feld
Why would it matter? He already has an uphill battle with virtualization and
seems a minimal amount of overhead. I expect running this on big iron should
show similar or less overhead. If you're just looking for benchmarks of
OpenSSL termination there are already a ton of them out there.

------
adekok
As an implementor using OpenSSL, I like this. It has a simple API, and is
clearly documented.

I wish all security improvements were as simple and easy.

------
ryan-c
Someone should make an LD_PRELOAD hack for this. I might try throwing one
together this weekend if someone doesn't beat me to it.

------
gg1989
How is it different than using a SoftHSM?

~~~
acveilleux
probably lower overhead. PKCS#11 is a lot bigger than this.

