
Improving HTTPS Performance with Early SSL Termination - tagx
http://blog.filepicker.io/post/29422604907/improved-https-performance-with-early-ssl-termination
======
buro9
I'm doing something very similar to this, the setup I'm using is:

DNSMadeEasy has a global traffic redirector (
<http://www.dnsmadeeasy.com/services/global-traffic-director/> )

That then sends a request to the closest Linode data center.

Linode instances run nginx which redirect to Varnish, and the Varnish backend
is connected via VPN to the main app servers (based in the London datacenter
as the vast majority of my users are in London).

I use Varnish behind nginx to additionally place a fast cache close to the
edge to prevent unnecessary traffic over the VPN.

Example: USA to London traffic passes over the VPN running within Linode, and
the SSL connection for an East Coast user is just going to Newark. If the
requested file was for a recently requested (by some other user) static file,
then the file would come from Varnish and the request would not even leave the
Newark data center.

~~~
donavanm
Is this roll your own CDN significantly cheaper than another provider? Or is
there some other advantage?

~~~
buro9
I was already on Linode and I'm only serving a few hundred GB of static files
per day (with the Linodes I have this is well within my free quota).

In my instance (forums with current discussions) most static file requests are
for image attachments in the very latest discussions, the hot topics. So
Varnish fits this scenario really well. I didn't need a long-term storage of
images in the CDN, I just needed to store the most recently requested items in
the CDN.

Linodes are cheap, I was already using them in a distributed fashion to reduce
SSL roundtrips, and introducing Varnish was a small configuration change.

I have tried a few other providers (most recently CloudFlare). But I was
generally not happy with them, usually due to a lack of visibility.

I proxy <http://> images within the user generated content over <https://>
when the sites are accessed over <https://> . And occasionally I found that
images would not load when I used a CDN provider for that. But never had
enough data and transparency with the CDN to know why. Users notice this stuff
though, so I'd have isolated users complaining of images not loading and no
way to debug or reproduce it.

So I found that as my scenario made Varnish a good fit, and the bandwidth was
within my allowance, and it was easy to do... well, I just did it.

I still experiment with CDNs every now and then, but largely I get more
reliability and transparency from my own solution. I've also found this to be
cost effective, though I would be OK with paying a premium if I found the
reliability and transparency rivalled my home-rolled solution.

------
ammmir
Maybe I don't understand the problem correctly, but why not just preflight an
HTTPS request when your widget loads?

In the time it takes the user to pick their file(s) to upload, the initial SSL
negotiation will most likely have finished. And if you upload multiple files
serially, the browser should even reuse the current SSL context, so it
wouldn't be ~300ms per file.

~~~
tagx
We don't make any connections until the website calls us. At that point we
load a personalized dialog for the user and we want that request to be as fast
as possible.

------
jbyers
How do you manage the keepalive connection pool? Are you managing this in
nginx (via HTTP 1.1 backend support?) or using a different service?

We ran a test of this approach using a similar stack in 2010. We had Ireland,
Singapore, Sydney backhauling to Dallas, TX for a reasonably large population
of users. Managing the backend pool was a bit of a challenge without custom
code. nginx didn't yet support HTTP 1.1 backend connections. The two best
options I could find at that time were Apache TrafficServer and perlbal.
perlbal won and was pretty easy to set up with a stable warm connection pool.

Despite good performance gains we didn't put the system into production. The
monitoring and maintenance burden was high and we lacked at that time a
homogeneous network -- I tested Singapore and Australia using VPS providers as
Amazon and SoftLayer (our vendors of choice) weren't there yet.

As a side-effect of using the VPS vendors we did and trying to keep costs in
control, we had to ratchet the TTL for this service down uncomfortably low to
allow for cross-region failover. In Australia the additional DNS hit nearly
wiped out the gains in SSL negotiation.

With today's increased geographical coverage and rich set of services from
Amazon, this is a much less daunting project if you can stomach the
operational overhead.

Note that the lack of sanely-priced bandwidth and hosting providers in
Australia is a huge problem. When Amazon lands EC2 there, it's going to really
shake up that market.

~~~
tagx
We are using nginx. Newer versions support HTTP 1.1 backends (There is also a
patch for older versions of nginx)

~~~
jbyers
How do you do get nginx to preconnect and maintain an appropriate-sized
backend pool?

~~~
piotrSikora
Most likely using "keepalive": <http://nginx.org/r/keepalive>

~~~
jbyers
That would do it. Thanks.

------
steve8918
So, if my understanding is correct, are they are trading SSL handshake latency
(which occurs once per connection), for the potential latency incurred by
having traffic redirected from multiple servers around the world to a single
set of application servers?

It seems like in the diagram, the West Coast Client, instead of making a
direct connection to the APP servers on the right, is instead making a
connection to the ELB on the left, which then forwards the traffic to the
nginx server, which forwards it to another ELB, which forwards it to the App
servers.

If the client connected directly to the ELB in front of the App Servers, they
would incur the SSL handshake latency, but would avoid the four extra hops
(two per send and two per receive) on the ELB and nginx.

Over the lifetime of the connection, is it possible that this latency could be
longer than 200 ms?

~~~
tagx
It is a possibility. However, I've measured 86ms between east and west EC2
instances, 96ms between my client on the west coast and an east EC2 instance,
and 15ms between my client and a west EC2 instance. Thus the additional
latency per connection is only about 5ms.

For the total latency to be longer than 200ms, about 20 requests would need to
be made on the same connection, which will not happen given the number of
requests we do at a time.

------
WALoeIII
Isn't this a "poor man's version" of what cloudflare offers?

They even have an optimized version called railgun
(<https://www.cloudflare.com/railgun>) that only ships the diff across
country.

------
crazygringo
Wow, this is actually really clever. Kudos to the engineer who thought of
this.

------
EvanAnderson
The "pool of warm keep-alive connections to the main web servers" is still
sending the traffic over HTTPS, then?

Edit: I'm clear that latency is reduced and how that's accomplished. I just
wanted to get clarification that the connections between the early SSL
termination and the web servers was also encrypted, too.

~~~
pjscott
Yes, but SSL connections are fine once they get going -- the nasty part is how
many round-trips are needed to complete the handshake. Any latency between the
client and the server is going to be multiplied several times over as they do
the initial ritual of verifying public keys and establishing a session key.

The trick here is to cut down on the latency of establishing the session.

------
hythloday
I'm sorry, I don't understand. How is this different from geographically
distributed reverse proxies?

~~~
lancefisher
These proxies are doing SSL between themselves and the app server and using a
pool of warm keep-alive connections to avoid multiple high-latency calls.
That's a little more than just a reverse proxy.

------
donavanm
You can get this from a CDN like AWS CloudFront as well. CloudFront will keep
a pool of persistent connections to the origin, whether it's S3 or a custom
origin. You can also do HTTP or HTTPS over the port of your choice on the
backend, enabling "mullet routing". The minimum TTL is 0, allowing you to vary
content for each request.

One issue with CloudFront is the POST PUT DELETE verbs aren't currently
supported, which is a kink for modifying data. You could use Route 53s LBR
feature to route requests to nearby EC2 instances, then proxy back to your
origin.

------
alexchamberlain
Would it be more effective to forward plain HTTP over a VPN instead? For
example, you set up your servers in London, East Coast and West Coast and
configure a VPN. People connect to their local servers via HTTPS and that
server forwards it to London via HTTP; the request would be encrypted by the
VPN. The advantage is that your proxy - Nginx is good for this - can bring up
additional connections quicker.

~~~
TheOnly92
I suppose that you usually want to protect the part from client->server rather
than just receiving encrypted things from server side.

~~~
alexchamberlain
Sorry, I don't understand?

~~~
TheOnly92
Oh I misunderstood you, I thought you were saying Client->HTTP->VPN->Server.

------
stevencorona
So, the way I understand it, the connection between the load balanacer <-> web
server is over the private network, right? And with VPC, your private network
is isolated and can't be snooped by other Amazon customers?

Sounds cool, but this would only work on Amazon or datacenters w/ cross-data
center private networks (SoftLayer has this, for example).

~~~
tagx
No, the way it works is that there is a load balancer that terminates ssl and
forwards it to nginx instances all in a private network. The nginx instances
then have secure HTTPS connections over the public internet to the main load
balancer that terminates ssl and forwards it over a private network the
application servers. So this would be possible with any network since the
cross country connections are encrypted.

~~~
kenshiro_o
That's a nice technique and the explanation is good while remaining concise.
We do something similar at work (I work in finance) where our clients connect
to a secure gateway using HTTPS but all communication with our other services
are made using an unsecure protocol. If it lives in your house then it's
likely to be harmless!

------
saurik
Standard CDNs will also accomplish this goal, and their bandwidth is normally
cheaper than EC2 instances.

~~~
tagx
Normal CDNs don't do this with dynamic content that changes on every user
request. Each api request we serve is different and saving 200ms almost
doubles the performance.

~~~
saurik
This is at least incorrect for Akamai and CDNetworks (examples of large CDNs;
if you are talking about something silly like CloudFlare, then all bets are
off). I run my entire website, most of the content of which is dynamic,
through CDNetworks; they definitely maintain hot connections from their
systems through to my server, and use it for uncached origin fetches. For more
information on related performance improvements, see one of my earlier
comments.

<http://news.ycombinator.com/item?id=2823268>

~~~
jaylevitt
Sure, but Akamai is the "big iron" of CDNs - you can run your own custom code
in a JVM at their edge locations. So I kinda think anyone in the market for
Akamai isn't getting SSL termination advice on HN :)

------
dawolf
I think maintaining a pool of 'warm' https sessions between the nginx and the
app server is not a very flexible approach. What happens when all of those are
occupied? Wouldn't it be nicer to have an IPsec tunnel between the nginx and
the app server and open http sessions on demand?

~~~
1SaltwaterC
> What happens when all of those are occupied?

Backlog. That increases the latency till a new connection can be accepted.
However, the number of pooled connections can be increased to a fairly large
number at the expense of more memory consumption. This is something that isn't
an issue with nginx by using it as a HTTPS proxy.

~~~
dawolf
Was more a rhetorical question. ;)

------
iamrekcah
And... how is this different than SSLStrip? Except maybe that SSLStrip also
prints out the HTTP form values as the data passes through.

------
yalogin
Wonder what the server certificate checking they are doing is. Its taking them
200ms seems a lot.

~~~
slpsys
The 200ms is pretty well spelled out in the beginning of the post. It's not
that cert checking is taking 200ms by itself, it's that sending any packet
cross-country takes 80-100ms, round trip, and so if you have to go cross-
country two extra trips...there's your 200ms.

------
aaronpk
Doesn't this mean the traffic is being sent un-encrypted across the ocean?

~~~
TallGuyShort
The impression I got from the article was that the warm keep-alive connections
were encrypted - the SSL handshake takes place ahead of time and then tunnels
multiple requests from multiple users - hence the lower latency.

Amazon's ELB (the EC2 load balancer) used to send HTTPS traffic to your back-
end unencrypted, but I believe they have since fixed this.

~~~
ghotli
Not sure what you mean by your ELB/HTTPS comment. ELB can be used as an HTTPS
terminator. It will then proxy traffic to your backend as HTTP. It can also be
used as a straight TCP proxy, but in that case it's just shoving along the
HTTPS request to an HTTPS terminator that you maintain.

~~~
TallGuyShort
>> Not sure what you mean by your ELB/HTTPS comment. ELB can be used as an
HTTPS terminator. It will then proxy traffic to your backend as HTTP.

That's what I mean. In that mode it's sending traffic that should be HTTPS
over HTTP.

------
pandemicsyn
Wait. "The actual HTTP request would then be sent to the intermediate instance
which then forwards it on" are you forwarding this on in plain text ? Is the
traffic at least traversing a VPN between the two locations ?

~~~
pjscott
As the article says, the intermediate traffic happens over long-lived HTTPS
connections.

------
MIT_Hacker
This post shows off how engineers aren't the best at showing off their work. I
think if the author abstracted this post and didn't dive so far into the
technical aspects of the problem, it could appeal to a much wider audience.

For example, the discussion of nginx could be abstracted into a discussion of
graph theory, where a handshake has to occur with a secure cluster of nodes.

This is all just IMHO. Great post though!

