Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What is an acceptable latency for a web load balancer?
79 points by axismundi on Oct 2, 2020 | hide | past | favorite | 66 comments
We use an external company for network management. They've put a Citrix ADC load balancer in front of our web servers, which is also terminating TLS. I'm not sure if it was always like this (before I joined the company) but downloading a 1000 bytes file via HTTP takes ~15ms while the same file via HTTPS takes at least 110ms. The external company ignores this and claims that SSL termination is expensive. I can understand a few ms, but clearly 100ms is rubbish.

What do you think folks?




100ms for setting up an initial TLS connection is about what I'd expect. It depends on the key size, ciphers, etc, but up to 300ms is typical.

However, this should only be done at the beginning of the connection. After this the client will have a symmetric encryption key that is much faster to use. Their load balancer should be caching these sessions so that subsequent connections don't need to re-negotiate a session key.

If this 110ms is only on the first request, and a cache miss on the sessions, then I'd say that's probably something you should be expecting. If it's after the TLS session has been set up, or on a cache hit, that sounds bad. It also could be that their session cache isn't large enough and is forgetting sessions too soon, causing more TLS negotiation than may be necessary.


This is happening for every single request.


Make sure you’re reusing connections. A lot of languages and libraries create a new connection for every request unless you explicitly manage a session or connection pool.


Also, make sure the adc is reusing connections. It's a setting on the vserver.


Their server and/or your client might not be setup for session reuse. Definitely a thing to check.


Thank you. I will make sure this is enabled.


Most of the latency of a TLS handshake (which equals the minimum latency for the first response on a HTTPS connection) comes from round trips. You need 1 for TCP connection establishment. And 1 to 2 (depending on the TLS version) for the TLS handshake.

300ms would be rather on the high side. RTTs with todays infrastructure are often 30-50ms if you hit an Edge server (opposed to a server around the world). So you should end up up at around 100ms for the complete TLS connection establishment.


100ms TLS session setup time with RTT <15ms seems rather unacceptable to me.

Can you provide the timings this command produces for you?

    curl -s -w 'TCP=%{time_connect} TLS=%{time_appconnect} ALL=%{time_total}\n' https://YOUR_REMOTE_HOST_GOES_HERE -o /dev/null

That should provide an easy to use ruler to compare measurements.


I was bored so I tested this on AWS from EC2 instance to ELB HTTPS load balancer (in the same VPC), and I got about 5-10ms time_total as reported by curl. So that sort of gives a baseline on what TLS latency can be when network is fast.


I've written a simple bash script based on your one liner: https://ybin.me/p/53f0aa4e1204b470#c561Frz6wpKkfvsTklDdZFOer...

Here is the output for 100 reqs: https://ybin.me/p/1e68f80f6910ba0d#USqFo3Loksx5rHQIe16NHp299...


what are the output units? seconds, or ms? my results make most sense if it's seconds:

    $ curl -s -w 'TCP=%{time_connect} TLS=%{time_appconnect} ALL=%{time_total}\n' https://<my-domain>/ -o /dev/null

    TCP=0.021106 TLS=0.052928 ALL=0.143908


All of curl's --write-out values that begin with `time_` are given in seconds.


thank you


I think if you're quibbling over 100ms you need to be paying more so that people listen to the issue you're having and work with you.

I'm not saying this to minimise like "oh it's only 100ms" I mean there are providers out there that will work with you on this. They're not cheap.

If you pay enough you can reshape the internet - I once had an issue with a backup process and ended up having two ISPs start peering with each other to fix it.


Instead of saying 100ms, you can also say "the total request time increased by 733% with the introduction of TLS (from 15ms to 110ms), something's probably wrong and we should take a further look at it".

I don't like playing these kind of games but sometimes you have to do it.


So many people in the industry don't understand when I say that a 100ms outage is a problem, it's depressing.


This depends what "industry" you're talking about.

If you're talking about "tech", I'd say 100ms doesn't matter for the vast majority of use-cases. Bear in mind that Australia/New Zealand have a ~250ms round trip latency to most web services, 4G can easily have a 300ms latency to Google, and developing countries are often in a much worse position. Depending on <100ms for anything is a non-starter for most of the web.

If you're a service provider several layers removed from a human, then there may be enough between you and the user that it's critical that you respond in <100ms, but this is not most of the industry.

More generally, I find this sort of comment to be similar to the sort of comment that asks why anyone would write a web service in a language as slow as Ruby or Python when we have languages like Go and Rust. They ignore the fact that there are huge, successful companies running on Ruby, Python, and other "slow" languages, all for very good reasons.


100ms becomes significant if you're doing teleconferencing or other real time interaction. If we somehow improved latency and jitter on the internet generally, I think that'd do more to improve its usefulness than expanding bandwidth. And don't forget about the studies out there that show a relationship between load time and revenue in e-commerce: "A 100ms decrease in checkout page load speed amounted to a 1.55% increase in session-based conversion."


Teleconferencing is a good example of where this does matter, although I'd suggest that despite the current situation we're all in, it's not a significant part of the industry.

> And don't forget about the studies out there that show a relationship between load time and revenue in e-commerce: "A 100ms decrease in checkout page load speed amounted to a 1.55% increase in session-based conversion."

This is true, although I believe this relationship doesn't hold when you're moving from 200ms to 100ms, it's more for when you're moving from 1000ms to 900ms.


This is 100ms for establishing a TLS connection - that's not the same as 100ms latency for all of a video stream.


Oh yeah, you're absolutely right. I got a bit tangential. Slow establishment wouldn't matter for a stream. I'm sure it's something more like the e-commerce page load time vs conversions scenario for OP.


I could live with slow handshake, but it's ~110ms for every request.


I had a similar problem working with an managed file transfer service. Lots of small files would get bogged down by these sorts of issues. The other side of the connection wouldn’t change their config, so I implemented a proxy that maintained session state on our side.


If doing realtime interaction it's unlikely they'd negotiate more than one connection (reusing connections or not), so the test they've done hardly seems useful.

I also have huge issues with a lot of those studies; 100ms from 200ms -> 100ms is immensely different than 1500ms -> 1400ms.


Latency is capped by two third the speed of light. The network latency is actually pretty good overall considering this (the issue is really on the user end with terrible Wifi and mobile connections).

The speed of light is quite impactful when you're looking at different countries or continents.


> Latency is capped by two third the speed of light

Starlink: hold my beer.

I'm not a HFT, I know I can't increase the speed of light, I know I can't tunnel through the earth to shave a few microseconds off a transmission. I can appreciate the difference between 300ms rtt and 900ms rtt - the later being a practical minimum when you're coping with outages in the 100-200ms range and networks reroute.

Give me jitter over drops any day.


Yes, this is a real time platform. I spend some of my dev time optimising performance.


Computer industry in general, but network providers specifically.

It's not about <100ms latency, it's about a <100ms outage

I know I can't rely on providers to not have outages on 100ms -- a link fails in Sudan and BGP flaps on a peering it cane take seconds, let alone milliseconds, to respond.

I have to be very careful, because if I want a two-way communication at that 250ms round trip time to NZ, that means I have to ensure my packets get there and back in 250ms, and not drop in the middle and require retransmissions, which rapidly escalates latency from 'barely noticable' up to 'painful' levels (1 second or above is meaningless, may as well send a fax[1])

  0ms : send packet containing informaiton
  125ms : receive packet
  135ms: process informaiton, send response
  260ms: receive response
  270ms: process response
Job done.

Trouble is, a 100ms outage from 0ms to 100ms means

  0ms : send packet containing informaiton
  125ms : fail to receive packet
  135ms : notice packet missing, issue request for retransmit
  260ms: receive retransmit request
  260ms: retransmit
  385ms : receive packet
  395ms: process informaiton, send response
  620ms: receive response
  630ms: process response
Given I therefore need (630-270 = 360ms) of buffer, that's more than doubled the response.

But it gets worse

  125ms : fail to receive packet
  135ms : notice packet missing, issue request for retransmit
  (request gets lost)
  395ms: notice retransmit request not received, issue request for retransmit
  405ms: issue retransmit request
  530ms : receive retransmit request, retransmit 
  655ms: receive packet
  665ms: process information, send response
  790ms: process response

So to cope with a 100ms outage my application needs to insert a 520ms buffer on a 270ms conversation - practically tripling latency, and converting a near- real time convertsation into near-catastrophic talking over each other.

Now this is fine, I can have two circuits, send the packets down both, and the chance of both being broken is minimal -- assuming the links are really separate (which is another challange. Hint, both patchs down the same undersea cable on different frequencies is not separation)

However getting network providers to acknowlege, let alone measure, outages of 100ms, is practically impossible on a country basis, let alone an intercontinental basis.

[1] hyperbole. Just.


Except it does matter. Every 100ms of latency caused Amazon 1% in sales. Every 500ms of latency cost Google 20% of their traffic. And these are 10 year old metrics, back when people tolerated slowness much more.


It’s mind boggling how much can happen in 100ms. That’s 6,000 protobuf kafka messages consumed in clojure, deserialized, serialized, sent and acknowledged in 2015 aws. Upgrading a task caused fraction of million message backlogs at times, quickly caught up on.


The developer/technical community at large seems to be inhuman in their ability to disregard how shitty a user experience can begin to feel once latency exceeds double digits.


TBQH, the "acceptable latency" is whatever is defined in your contract with them.

If the response to that is "there isn't one", well, there's your real problem.


I don't think every performance metric has to be tied to an explicit contract. There are thousands of such metrics in a large organization's tech stack.

What OP is asking is whether the performance they are seeing is comparable to the industry-leading load balancers that HN readers are familiar with. The answer to that question is independent of any contract.


I once asked an engineer I respect deeply “what’s an acceptable timeout for this integration test?” His answer has stuck with me for a long time: “whatever your users will tolerate when they attempt that workflow.”

Software can always be made better and can always be made faster. But you do have to know when to stop, and for me it’s always at “what my users will tolerate.”


Don't push them to the edge every time. Try to have a conformable margin so they can use your services without it grating on them.


The reason they're asking about acceptable latency is that unless the contract specifies differently both parties are bound by what the counterparty can reasonably expect. Each country has contract law like this, with lots of legal precedents on what can and cannot reasonably be expected in certain situations.

Part of the reason there are all these terms and conditions documents that nobody reads is to prevent this ambiguity (to a point). Although legally it's a bit weak because you cannot really expect people to have agreed to all terms (but it does reverse the burden of proof somewhat, so it's still useful).


Fetching a one-TCP-package file via HTTP should take about 2 RTT_1 from a webserver (and ignoring connection tear down). The LB needs another RTT_2 to fetch data over a (hopefully) pre-established TCP connection from your webserver. Assuming a 2ms RTT_2 at the data center, your RTT_1 seems to be about 5.5ms.

A quick google search indicates with TLS the whole think takes at least 4 RTT_1, plus the RTT_2 because the LB still has to fetch your data, so your hard limit is 24ms, plus how long it takes to actually transfer the additional data (especially certificates).

This leaves about 86ms.

Now, does your client perform any checks for a revoked certificate? Assuming these checks are also done via TLS, and a "far away" server with a RTT of 15ms, then that's a whooping 60ms your client is taking to ensure that the certificates are valid. In that case your LB would merely be 26ms slower than the theoretical optimum.


A TLS 1.3 handshake adds 1 extra RTT to the normal TCP 3-way handshake(assuming 0-RTT is not in use)


How much network latency is between you and the load balancer, and between the load balancer and the web servers? Normally, the initial TLS connection should be around 4 * the network latency, and subsequent requests should be very close to the network latency.

Give httping [0] a try for benchmarks.

[0]: https://www.vanheusden.com/httping/


Yes, jumping from 15ms to 100ms is rubbish.

To compare network latency in terms everyday techies can follow, use video gaming: 15ms latency is 60 fps, and 100ms latency is 10 fps.

Which game would they want to play?

That said, it's not clear if you mean "time to first byte" or "time to transfer 1000 bytes" or both combined. The first is less of an issue; figure out which part of the request and data transfer timing is changing.

(Disclosure: In a past life I built a white label global content delivery network.)


I down voted against the example. It's really bad because latency has no relevance to FPS and games can actually approach 100 ms and work just fine (consider rendering time and TV delay as well).

It's fine because the latency is constant. Your brain internalizes that there is a delay between moving buttons and things happening. Your brain internalizes that you need to shoot where things are going not where they are now.

If anything, latency is an example that consistency is more important than speed.


Latency has no relevance to FPS (frames per second, not first person shooter)?

Why do gamers care about not just TV frame rates, but how many frames delay (latency) come from the TV's video scalar / processor?

When dealing with round trip times (RTT) for TCP or video gaming, latency limits the number of round trips or control loops per second and the gap between input and action.

To your consistency point, agree -- jitter matters a lot to human perception and response too, why so many games "lock" to 30 or 60 fps rather than vary at rates above 30 or 60.

Also, it was not an example. It was an analogy to put the differences in milliseconds into amounts of time that matter to a broader set of people than networking times matter to.


Latency is not related to frame per seconds, yes.

You could feed a thousand frames per second to a TV and the TV can have 100 ms latency (delay to display a frame after it received it).

You could feed 30 frames per second to the same TV and it'd still have a 100 ms latency.

The latency is rarely advertised. It's highly variable per TV and per image. There's the time it takes to start displaying an image and the time until the image is fully set (it is much longer to go white to black than white to grey).


I take it you're explaining this for our readers.

That said, anyone who wonders why they can't download a file over TCP/IP at gigabit speeds on their gigabit FIOS is realizing there is a link between latency and throughput (network frames per second).

Put another way, latency is not related to number of frames that can be sent at once (bandwidth) but if RTT factors in flow control, those two work together to limit throughput.


That's the thing, there isn't a link between latency and throughput in a pipe.

TCP is not exactly a pipe. It's sending a message and waiting to receive a confirmation back before sending more. It's control logic to control the flow in a pair of pipes, that's affected by the latency back and forth.

UDP doesn't do that. The Ethernet link doesn't do that. The HDMI link to a TV doesn't do that.


If you're still writing to enlighten readers, this last information may risk losing them, as it's no longer relevant to their real world experience. In their real world experience, these are linked, as the control logic you acknowledge is affected by latency impacts their download, which is over TCP/IP. But this is a diversion, as I was not attempting the make that connection. I was relating the size of 15 ms and 100 ms to an experience where people can perceive those intervals.

If, by contrast, you're showing me you understand this, sounds like you do. We had developed an internal tool before Aspera's FASP[1], for our clients to deliver us video, and us to distribute content among hubs at scale, at speeds that saturated links. Understanding the mechanics and limiting factors of information propagation was necessary to be on that team. Turns out most devs don't have a great grip on the physics -- in the most common mental model, everything happens instantly. And so things get slow.

1. https://en.wikipedia.org/wiki/Fast_and_Secure_Protocol


Well, we went from video games to TV to TCP. That's quite a lot of different things, no wonder we lost people on the way. =)

There is one thing I'd like to insist on though, I'm not sure whether people could actually notice 100 ms of latency after user input. You assume that users do? but I don't think they necessarily do. I think the full delay between when you press a button on a gamepad and when the frame changes on the TV is somewhere around that and gaming consoles are generally considered very usable. (Of course latency in computer networks is a different matter).

My daily job among other things is to optimize software and networking infrastructure covering 4 continents and 50 datacenters. I'd say I have a certain grasp of networking. ;)


It's been 12 days since our last confession, but this Verge review of the Xbox Series X touches on this notion of whether people notice or 'feel' input lag, tied to frame rates (and the interval between them):

This is where the Xbox Series X really started to feel even more PC-like to me. I play at 165Hz with frame rates that exceed 200fps in games like Destiny 2, Valorant, Call of Duty: Warzone, and CS:GO on my gaming PC. I do this a lot of the time by dropping a lot of the quality settings lower because I personally value frame rates over visual quality. Running around the versus multiplayer mode in Gears 5 in 120Hz felt like I was playing on my PC. With frame rates hitting 120fps, input lag is reduced, and the experience was suddenly so much smoother than what I’ve ever experienced on an Xbox One X.

That same feeling of PC-like smoothness plays out in Dirt 5 with the 120Hz mode enabled. Sure, the game drops to rendering at 1440p and some of the visual quality is lowered to achieve 120fps, but when I’m sliding around corners and the input latency is reduced, it’s far better than some mud and snow rendering just that little better on my 4K TV to the point I probably wouldn’t notice the difference.

It’s that feeling that’s really important with this new Xbox, and I can’t stress it enough.

-- https://www.theverge.com/2020/10/15/21515790/xbox-series-x-p...


It depends on exactly what and how you're measuring and why.

TLS has a non-trivial bootstrap cost, due to the need for extra network round-trips during establishment (mitigated with 0-RTT, if available), the additional byte overhead of certificate exchange (which can be substantial relative to small payloads if you have a long chain or are sending unnecessary certs, and more so if you're using RSA keys), and the cost of performing the crypto operations for the public key part and key exchange.

So if your use-case is "client arrives from the blue with a one-off API request and latency is important" then you are going to suffer with TLS.

If this is a web application (on the other hand), then what's actually going to happen is that the browser is going to establish HTTP/1.1 connections over TLS to the load-balancer, keep them alive, and re-use them. Assuming a well-configured load-balancer, it's also going to have the same from it's back-end to your actual service implementation.

However, once that's done, the only necessary overhead is the symmetric key stuff (microscopic) and maybe rekeying in long sessions.

Using curl to do a one-and-done connection to your service (as suggested by another poster) will give you an estimate which is only relevant to the first use case I described.


As mentioned, the TLS session is probably not cached yet, following requests should be faster.

However it's possible Citrix is just not as optimized as other providers/services or not support newer protocols like TLS 1.3

However, 100ms doesn't seem critical


That's a valuable tip, thanks. I will look into the LB config if there is anything cache-related.


Does your use case need low latency? No? Don't worry about it then. yes? Does it have clear measurable impact on the biz? No? probably wont' be able to convince anyone to do anything about it. yes? Show them the difference and clear benefits of how this improves customer/business value.


It's a real time trading website, so every millisecond counts.


If it needs to be real time enough for trading then surely even the 15 ms for a non-TLS connection is too much? HFT system latency is measured in nanoseconds these days and I'm pretty sure the HTTP stack has too much overhead to be used in that domain.


Don't think a real time trading website has anything to do with HFT. Still, some of these trading websites that target day traders for customers need to be pretty fast.


Check the endpoint with one of the SSL tools out there, like SSL Labs: https://www.ssllabs.com/ssltest/analyze.html?d=news.ycombina...

Some of the settings there link to tips for more investigation. Look at the session resumption settings in particular.


Thanks, I did. Here are relevant results: https://ybin.me/p/c73726c16d669a5d#WGH7arFJpP8aitaNcBCGqRP3t...


Have you (or they) made sure the load balancer is using the best options for TLS handshake performance for client facing and server facing connections?

If possible, Eliptic Key certificates and ECDHE can be a lot faster than RSA certificates and RSA based DHE. Especially if the load balancer is CPU constrained. Make sure TLS sessions or tickets are honored (tickets prefered over sessions so servers don't need to maintain a session database)

In my personal opinion, I'd rather not have the load balancer terminate TLS, but you lose a lot of features that way, and that might not be an option.


I should clarify that 110ms is happening for all requests.

Tested with this: https://ybin.me/p/53f0aa4e1204b470#c561Frz6wpKkfvsTklDdZFOer...

and got this: https://ybin.me/p/1e68f80f6910ba0d#USqFo3Loksx5rHQIe16NHp299...


You're still testing only the initial connection timing this way. Curl does not preserve the TLS sessions between executions by default.

To test session resumption you can do: "curl https://your-url https://your-url https://your-url https://your-url" instead.


I haven't worked with Citrix ADC but if it's like any other load balancer you want to look at the SNI configurations settings and the TCP profile/buffer size. Increasing the buffer size may help but from my experience if you are offloading SSL then the buffer window shouldn't matter since the HTTPS connection is not being decrypted at the destination but who knows maybe ADC requires a larger buffer for HTTPS versus HTTP.


Citrix LBs come in many flavors. At a guess, since you talk about it being externally managed - I'd ask your vendor if you are on dedicated hardware/instance vs being yet another tenant on an (and I might be forgetting my terminology) already oversubscribed VPX instance.

I can tell you from my personal experience that +85ms overhead is definitely not what I'd expect for TLS termination.


Make sure you use HTTP 1.1 or newer and keep-alive. That way, you only have the SSL negotiation overhead on the first HTTPS request, as for all following requests, the connection will be re-used. And then, 100ms for the first request and 15ms for each following request sounds reasonable to me.


Here are some details from SSL Labs checker: https://ybin.me/p/c73726c16d669a5d#WGH7arFJpP8aitaNcBCGqRP3t...


Most people don't care about latency but no, 110ms is not what I'd expect in this case. Take a look at these files hosted by google:

https://developers.google.com/speed/libraries

d3.js on my network is ~200kb and downloads in 20ms with ssl.


95-130ms on mine, it just depends.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: