Hacker News new | past | comments | ask | show | jobs | submit login
Improving HTTPS Performance with Early SSL Termination (filepicker.io)
97 points by tagx on Aug 14, 2012 | hide | past | favorite | 59 comments

I'm doing something very similar to this, the setup I'm using is:

DNSMadeEasy has a global traffic redirector ( http://www.dnsmadeeasy.com/services/global-traffic-director/ )

That then sends a request to the closest Linode data center.

Linode instances run nginx which redirect to Varnish, and the Varnish backend is connected via VPN to the main app servers (based in the London datacenter as the vast majority of my users are in London).

I use Varnish behind nginx to additionally place a fast cache close to the edge to prevent unnecessary traffic over the VPN.

Example: USA to London traffic passes over the VPN running within Linode, and the SSL connection for an East Coast user is just going to Newark. If the requested file was for a recently requested (by some other user) static file, then the file would come from Varnish and the request would not even leave the Newark data center.

I can't edit my post but I should note that there is an edge case I'm aware of where this kind of solution might not be the fastest solution for the end user, and this would likely affect what filepicker.io are doing too.

The edge case is that some DNS providers (Google, OpenDNS) already pick what they feel is the closest end point.

I read about that stuff over here a while ago: http://tech.slashdot.org/story/11/08/30/1635232/google-and-o...

And this comment explains it best: http://tech.slashdot.org/comments.pl?sid=2404950&cid=372...

I haven't fully investigated this, and I don't know whether it is affecting some users. But when I implemented my solution I was aware that it might be possible for some small subset of users, for this to not result in a faster connection than if I'd done nothing at all (the closest resolver to Google may actually be further from the customer than the local server I run).

I'm just betting that for the vast majority of users this does bring about a noticeable increase in speed.

Google runs lots of DNS servers. You (your ISP) pick the closest to you. That will in turn do the lookup and get the linode closest to google dns, which should also be close to you.

If you're using a North American Google DNS server, you'll get answers that say NA. If you use the DNS server in Europe, you'll get answers that say EU.

I'm assuming Google doesn't try to sync and cache between instances, but I don't see why they would. That's a lot of work for no benefit.

IIRC the google public DNS service has cache coherency intra location, but not inter.

From what I've seen end users hit a google dns cluster in the approximate geo area. However I e definitely seen odd peering of a public DNS node in EU hitting provider anycast nodes in NA.

That's pretty much it.

If a request via surfaced at a DNS server in North America, and DNSMadeEasy (in my example) then answered that request with a "Oh, you must be in North America, well for you the IP address of the web site is"... then you might not have got the answer you expected.

i.e. You might be in Spain, and using OpenDNS (or Google) the DNS query against DNSMadeEasy might surface on the East Coast of the USA, and as such you'd end up at Linode Newark rather than Linode London.

That particular example is pure speculation, but it illustrates the point.

As I said, I believe that the amount this must happen is just a slight edge case and as a whole isn't worth troubling about. But it is there as an edge case I'm aware of.

And if someone reads this thread and thinks, "Hey, this distributing SSL stuff is a great idea.", then as always, caveat emptor and check whether any potential issues that might arise are an issue for you and your application.

That's a great idea. We're hosting with SoftLayer and have been considering doing something similar. They offer free bandwidth on the private network between data centers (and have pretty good pings between them). With a cloud server in each data center, you could achieve a similar thing while avoiding the need for VPN and not paying for extra bandwidth.

Is this roll your own CDN significantly cheaper than another provider? Or is there some other advantage?

I was already on Linode and I'm only serving a few hundred GB of static files per day (with the Linodes I have this is well within my free quota).

In my instance (forums with current discussions) most static file requests are for image attachments in the very latest discussions, the hot topics. So Varnish fits this scenario really well. I didn't need a long-term storage of images in the CDN, I just needed to store the most recently requested items in the CDN.

Linodes are cheap, I was already using them in a distributed fashion to reduce SSL roundtrips, and introducing Varnish was a small configuration change.

I have tried a few other providers (most recently CloudFlare). But I was generally not happy with them, usually due to a lack of visibility.

I proxy http:// images within the user generated content over https:// when the sites are accessed over https:// . And occasionally I found that images would not load when I used a CDN provider for that. But never had enough data and transparency with the CDN to know why. Users notice this stuff though, so I'd have isolated users complaining of images not loading and no way to debug or reproduce it.

So I found that as my scenario made Varnish a good fit, and the bandwidth was within my allowance, and it was easy to do... well, I just did it.

I still experiment with CDNs every now and then, but largely I get more reliability and transparency from my own solution. I've also found this to be cost effective, though I would be OK with paying a premium if I found the reliability and transparency rivalled my home-rolled solution.

Static files are still served by a normal CDN. This helps with dynamic HTTPS requests that change each time.

Maybe I don't understand the problem correctly, but why not just preflight an HTTPS request when your widget loads?

In the time it takes the user to pick their file(s) to upload, the initial SSL negotiation will most likely have finished. And if you upload multiple files serially, the browser should even reuse the current SSL context, so it wouldn't be ~300ms per file.

We don't make any connections until the website calls us. At that point we load a personalized dialog for the user and we want that request to be as fast as possible.

How do you manage the keepalive connection pool? Are you managing this in nginx (via HTTP 1.1 backend support?) or using a different service?

We ran a test of this approach using a similar stack in 2010. We had Ireland, Singapore, Sydney backhauling to Dallas, TX for a reasonably large population of users. Managing the backend pool was a bit of a challenge without custom code. nginx didn't yet support HTTP 1.1 backend connections. The two best options I could find at that time were Apache TrafficServer and perlbal. perlbal won and was pretty easy to set up with a stable warm connection pool.

Despite good performance gains we didn't put the system into production. The monitoring and maintenance burden was high and we lacked at that time a homogeneous network -- I tested Singapore and Australia using VPS providers as Amazon and SoftLayer (our vendors of choice) weren't there yet.

As a side-effect of using the VPS vendors we did and trying to keep costs in control, we had to ratchet the TTL for this service down uncomfortably low to allow for cross-region failover. In Australia the additional DNS hit nearly wiped out the gains in SSL negotiation.

With today's increased geographical coverage and rich set of services from Amazon, this is a much less daunting project if you can stomach the operational overhead.

Note that the lack of sanely-priced bandwidth and hosting providers in Australia is a huge problem. When Amazon lands EC2 there, it's going to really shake up that market.

We are using nginx. Newer versions support HTTP 1.1 backends (There is also a patch for older versions of nginx)

How do you do get nginx to preconnect and maintain an appropriate-sized backend pool?

Most likely using "keepalive": http://nginx.org/r/keepalive

That would do it. Thanks.

So, if my understanding is correct, are they are trading SSL handshake latency (which occurs once per connection), for the potential latency incurred by having traffic redirected from multiple servers around the world to a single set of application servers?

It seems like in the diagram, the West Coast Client, instead of making a direct connection to the APP servers on the right, is instead making a connection to the ELB on the left, which then forwards the traffic to the nginx server, which forwards it to another ELB, which forwards it to the App servers.

If the client connected directly to the ELB in front of the App Servers, they would incur the SSL handshake latency, but would avoid the four extra hops (two per send and two per receive) on the ELB and nginx.

Over the lifetime of the connection, is it possible that this latency could be longer than 200 ms?

It is a possibility. However, I've measured 86ms between east and west EC2 instances, 96ms between my client on the west coast and an east EC2 instance, and 15ms between my client and a west EC2 instance. Thus the additional latency per connection is only about 5ms.

For the total latency to be longer than 200ms, about 20 requests would need to be made on the same connection, which will not happen given the number of requests we do at a time.

Isn't this a "poor man's version" of what cloudflare offers?

They even have an optimized version called railgun (https://www.cloudflare.com/railgun) that only ships the diff across country.

Wow, this is actually really clever. Kudos to the engineer who thought of this.

The "pool of warm keep-alive connections to the main web servers" is still sending the traffic over HTTPS, then?

Edit: I'm clear that latency is reduced and how that's accomplished. I just wanted to get clarification that the connections between the early SSL termination and the web servers was also encrypted, too.

Yes, but SSL connections are fine once they get going -- the nasty part is how many round-trips are needed to complete the handshake. Any latency between the client and the server is going to be multiplied several times over as they do the initial ritual of verifying public keys and establishing a session key.

The trick here is to cut down on the latency of establishing the session.

Yes, but the SSL handshake has already been completed ahead of time so all the overhead is reduced.

I'm sorry, I don't understand. How is this different from geographically distributed reverse proxies?

These proxies are doing SSL between themselves and the app server and using a pool of warm keep-alive connections to avoid multiple high-latency calls. That's a little more than just a reverse proxy.

Thats what this is.

You can get this from a CDN like AWS CloudFront as well. CloudFront will keep a pool of persistent connections to the origin, whether it's S3 or a custom origin. You can also do HTTP or HTTPS over the port of your choice on the backend, enabling "mullet routing". The minimum TTL is 0, allowing you to vary content for each request.

One issue with CloudFront is the POST PUT DELETE verbs aren't currently supported, which is a kink for modifying data. You could use Route 53s LBR feature to route requests to nearby EC2 instances, then proxy back to your origin.

Would it be more effective to forward plain HTTP over a VPN instead? For example, you set up your servers in London, East Coast and West Coast and configure a VPN. People connect to their local servers via HTTPS and that server forwards it to London via HTTP; the request would be encrypted by the VPN. The advantage is that your proxy - Nginx is good for this - can bring up additional connections quicker.

I suppose that you usually want to protect the part from client->server rather than just receiving encrypted things from server side.

Sorry, I don't understand?

Oh I misunderstood you, I thought you were saying Client->HTTP->VPN->Server.

So, the way I understand it, the connection between the load balanacer <-> web server is over the private network, right? And with VPC, your private network is isolated and can't be snooped by other Amazon customers?

Sounds cool, but this would only work on Amazon or datacenters w/ cross-data center private networks (SoftLayer has this, for example).

No, the way it works is that there is a load balancer that terminates ssl and forwards it to nginx instances all in a private network. The nginx instances then have secure HTTPS connections over the public internet to the main load balancer that terminates ssl and forwards it over a private network the application servers. So this would be possible with any network since the cross country connections are encrypted.

That's a nice technique and the explanation is good while remaining concise. We do something similar at work (I work in finance) where our clients connect to a secure gateway using HTTPS but all communication with our other services are made using an unsecure protocol. If it lives in your house then it's likely to be harmless!

Oh, I guess I misunderstood. The load balancer <-> web server connection is over HTTPS, not HTTP.

You can edit (or delete) comments here for up to 2 hours after you make them.

I don't think he should delete his comment. It's helpful to see it explained in more detail from other contributors.

Delete, no. Editing would be fine, below an "ETA:"

I just meant he could have edited his first comment instead of replying to himself.

You can have the endpoint servers participating in a VPN with the backend servers. They don't have to be on EC2. This way you wouldn't need to make the front-back requests via https.

Standard CDNs will also accomplish this goal, and their bandwidth is normally cheaper than EC2 instances.

Normal CDNs don't do this with dynamic content that changes on every user request. Each api request we serve is different and saving 200ms almost doubles the performance.

This is at least incorrect for Akamai and CDNetworks (examples of large CDNs; if you are talking about something silly like CloudFlare, then all bets are off). I run my entire website, most of the content of which is dynamic, through CDNetworks; they definitely maintain hot connections from their systems through to my server, and use it for uncached origin fetches. For more information on related performance improvements, see one of my earlier comments.


Sure, but Akamai is the "big iron" of CDNs - you can run your own custom code in a JVM at their edge locations. So I kinda think anyone in the market for Akamai isn't getting SSL termination advice on HN :)

AWS CloudFront also supports persistent connection pooling to the origin.

I think maintaining a pool of 'warm' https sessions between the nginx and the app server is not a very flexible approach. What happens when all of those are occupied? Wouldn't it be nicer to have an IPsec tunnel between the nginx and the app server and open http sessions on demand?

> What happens when all of those are occupied?

Backlog. That increases the latency till a new connection can be accepted. However, the number of pooled connections can be increased to a fairly large number at the expense of more memory consumption. This is something that isn't an issue with nginx by using it as a HTTPS proxy.

Was more a rhetorical question. ;)

And... how is this different than SSLStrip? Except maybe that SSLStrip also prints out the HTTP form values as the data passes through.

Wonder what the server certificate checking they are doing is. Its taking them 200ms seems a lot.

The 200ms is pretty well spelled out in the beginning of the post. It's not that cert checking is taking 200ms by itself, it's that sending any packet cross-country takes 80-100ms, round trip, and so if you have to go cross-country two extra trips...there's your 200ms.

Doesn't this mean the traffic is being sent un-encrypted across the ocean?

The impression I got from the article was that the warm keep-alive connections were encrypted - the SSL handshake takes place ahead of time and then tunnels multiple requests from multiple users - hence the lower latency.

Amazon's ELB (the EC2 load balancer) used to send HTTPS traffic to your back-end unencrypted, but I believe they have since fixed this.

Not sure what you mean by your ELB/HTTPS comment. ELB can be used as an HTTPS terminator. It will then proxy traffic to your backend as HTTP. It can also be used as a straight TCP proxy, but in that case it's just shoving along the HTTPS request to an HTTPS terminator that you maintain.

>> Not sure what you mean by your ELB/HTTPS comment. ELB can be used as an HTTPS terminator. It will then proxy traffic to your backend as HTTP.

That's what I mean. In that mode it's sending traffic that should be HTTPS over HTTP.

No, the pool of keep-alive connections are all encrypted as well.

Wait. "The actual HTTP request would then be sent to the intermediate instance which then forwards it on" are you forwarding this on in plain text ? Is the traffic at least traversing a VPN between the two locations ?

As the article says, the intermediate traffic happens over long-lived HTTPS connections.

This post shows off how engineers aren't the best at showing off their work. I think if the author abstracted this post and didn't dive so far into the technical aspects of the problem, it could appeal to a much wider audience.

For example, the discussion of nginx could be abstracted into a discussion of graph theory, where a handshake has to occur with a secure cluster of nodes.

This is all just IMHO. Great post though!

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact