
Optimizing web servers for high throughput and low latency - nuriaion
https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughput-and-low-latency/
======
jbergstroem
I thoroughly appreciate how the majority of the article doesn't even go into
the nginx config whereas most of the internet would discuss a result of the
search query "optimized nginx config". Much [for me] to learn and much
appreciated, Alexey.

~~~
mfjordvald
One of my more popular blog posts is exactly about how to "optimise" nginx for
high traffic loads and the gist of it is pretty much "you can't really do
much, but here's some minor stuff". I consider that a good thing though, a web
server shouldn't really require config tweaking to perform well.

~~~
bogomipz
>"One of my more popular blog posts is exactly about how to "optimise" nginx
for high traffic loads and the gist of it is pretty much "you can't really do
much, but here's some minor stuff"."

Could you provide a link for that blog post? I am not seeing it.

~~~
62747478182
Maybe [https://blog.martinfjordvald.com/2011/04/optimizing-nginx-
fo...](https://blog.martinfjordvald.com/2011/04/optimizing-nginx-for-high-
traffic-loads/)

------
indescions_2017
Great write-up. And even if you use standard instances there is plenty to
optimize. Kudos to Dropbox, Netflix, Cloudflare and everyone else for who
demonstrates this level of transparency.

And just for reference, AWS does provide enhanced networking capabilities on
VPC:

[http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-...](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-
networking.html)

------
DamonHD
Fab tour de force! Great!

(Though I've been optimising my tiny low-traffic site on a solar-powered RPi 2
to apparently outperform a major CDN, at least on POHTTP...)

~~~
cortesoft
I am curious how you are measuring that you 'outperform' a major CDN? Average
response time from your local connection? From an arbitrary IP somewhere in
the world? An aggregated average response time from many places around the
world?

~~~
DamonHD
Have a look at these:

[http://www.earth.org.uk/note-on-site-
technicals-2.html](http://www.earth.org.uk/note-on-site-technicals-2.html)

[http://www.earth.org.uk/note-on-carbon-cost-of-
CDN.html](http://www.earth.org.uk/note-on-carbon-cost-of-CDN.html)

Basically measuring time to first byte (TTFB) (and indeed visually complete
etc) as a reasonable correlate of perceived performance, for a simple site
over http from my RPi vs via CloudFlare fully cached http and https. The most
direct comparison is of a separate fossil site, but other measuring tools show
(generally) my RPi http beats or matches CDN http beats CDN https (with HTTP/2
etc) for a UK visitor.

Tests are made from WebpageTest and StatusCake in the main, both from their
test points in the UK in various data centers. These sites of mine are UK
focussed, not global, so the UK test points are representative, I believe, and
should be advantageous to the CDN since it is terminated closer and faster to
the client than I can be, from by kitchen cupboard!

------
Upvoter33
This is a wonderful article. It also is indicative of why people who really
understand systems will always be so employable - it's just so hard to make
things run well.

------
excitom
This is a great article and I've bookmarked it for future reference.

I observe, though, that if you are tuning a system to this level of detail you
likely have a number of web servers behind a load balancer. To be complete,
the discussion should include optimization of interactions with the load
balancer, e.g. where to terminate https, etc.

~~~
daenney
> To be complete, the discussion should include optimization of interactions
> with the load balancer, e.g. where to terminate https, etc.

Not necessarily. There are other ways of spreading traffic, like DNS round
robin for example, using the DNS delegation trick or even have client-side
load balancing (which is perfectly feasible if you control the client, like
your own app for example).

The article itself mentions that what they're talking about is the Dropbox
Edge network, an nginx proxy tier, which sounds like load balancers to me.

~~~
bogomipz
Using round robin DNS for load balancing is almost never a good idea.

What is "the DNS delegation trick"?

------
exikyut
I have a related question that most would probably consider relevant but which
this article (quite rightly) doesn't answer (as it's not relevant for
Netflix).

Let's say I want to prepare a server to respond quickly to HTTP requests, from
all over the world.

How do I optimize _where I put it_?

Generally there are three ways I can tackle this:

1\. I configure/install my own server somewhere

2\. Or rent a preconfigured dedicated server I can only do so much with

3\. I rent Xen/KVM on a hopefully not-overcrowded/oversold host

Obviously the 1st is the most expensive (I must own my own hardware; failures
mean a trip to the DC or smart hands), the 2nd will remove some flexibility,
and the 3rd will impose the most restrictions but be the cheapest.

For reference, knowing how to pick a good network (#1) would be interesting to
learn about. I've always been curious about that, although I don't exactly
have anything to rack right now. Are there any physical locations in the world
that will offer the lowest latency to the highest number of users? Do some
providers have connections to better backbones? Etc.

#2 is not impossible - [https://cc.delimiter.com/cart/dedicated-
servers/&step=0](https://cc.delimiter.com/cart/dedicated-servers/&step=0)
currently lists a HP SL170s with dual L5360s, 24GB, 2TB and 20TB bandwidth @
1Gbit for $50/mo. It's cool to know this kind of thing exists. _But I don 't
know how good Delimiter's network(s) is/are_ (this is in Atlanta FWIW).

#3 is what I'm the most interested in at this point, although this option does
present the biggest challenge. Overselling is a tricky proposition.

Hosting seems to be typically sold on the basis of how fast `dd` finishes
(which is an atrocious and utterly wrong benchmark - most tests dd /dev/zero
to a disk file, which will go through the disk cache). Not many people seem to
setup a tuned Web server and then run ab or httperf on it from a remote with
known-excellent networking. That's incredibly sad!

Handling gaming or voice traffic is probably a good idea for the target I'd
like to be able to hit - I don't want to do precisely that, but if my server's
latency is good enough to handle that I'd be very happy.

~~~
tomsthumb
Wouldn't dd'ing /dev/zero create an FS hole in your target file on many file
systems? A similar (but maybe not isomorphic) situation in the Michael Kerrisk
book is one of the excercises, and I'd bet ZFS does similar but suddenly we've
assumed a lot of context.

~~~
exikyut
Good question. The 0th-step answer is that the people using `dd` are generally
going to be using all OS defaults, including for the filesystem.

For what it's worth, I know that dd'ing /dev/zero makes df show a smaller
value. AFAIK, df (for ext4 in my case) isn't reading a logical abstraction of
capacity as arbitrarily decided upon by a bunch of layers, but is
straightforwardly reporting the free blocks on disk.

------
siner
"If you’ve read this far you probably want to work on solving these and other
interesting problems! You’re in luck: Dropbox is looking for experienced SWEs,
SREs, and Managers."

Following the link will show "Open Positions" with, well, nothing to follow.
They did not only optimize their servers for throughput but also HR!

------
Thaxll
"people are still copy-pasting their old sysctls.conf that they’ve used to
tune 2.6.18/2.6.32 kernels."

This.

~~~
korzun
.. or 32 bit systems.

------
tiffanyh
My mind kind of explodes reading that article.

So many bells & whistles and I don't even know where to begin.

------
doubleorseven
wow. This is the kind of article i'm expecting to see when I google for "Nginx
best practices 2018". I am so far behind, maybe 20% of my usual setup include
those recommendations. Thank you Dropbox.

If someone can point me to a thorough article like this on the lua module, I
will thank her/him forever.

~~~
etiene
Maybe you will find some stuff here?
[https://openresty.org/en/presentations.html](https://openresty.org/en/presentations.html)

------
leetrout
I'm surprised there was no mention of tcp_max_syn_backlog and
netdev_max_backlog.

When I've previously tuned a server I have used both of those to my
advantage... Another comment on here talked about this ignoring an existing
load balancer so maybe those sysctls are more appropriate on an LB?

~~~
korzun
Those settings are irrelevant in this scenario. The point is to maximize the
throughput from a single server without backing up the queue.

------
mixedbit
Does the physical location of a server matter for high-throughput use cases?
If a client is downloading large files using all its available bandwith, is
the download time noticeably better if the server is close to the client?

~~~
SaveTheRbtz
Physical locations matters a lot for uploads, mostly because client TCP stack
is way less sophisticated.

For downloads high RTT can be mitigated by a congestion control that ignores
constant packet-loss rates (which are common for high-rtt paths). Other tricks
that you can try: fq+pacing and newer kernels with more sophisticated recovery
heuristics.

------
skarap
One congestion-control parameter which don't get it's fair share of articles
is initcwnd. And for HTTP traffic over high bandwidth+latency links it has
much larger impact than the choice of the cc algorithm.

See [https://www.cdnplanet.com/blog/tune-tcp-initcwnd-for-
optimum...](https://www.cdnplanet.com/blog/tune-tcp-initcwnd-for-optimum-
performance/)

~~~
SaveTheRbtz
I've mentioned it very briefly in a Network Stack part, around the "upgrade
your kernel" warning:

> and I’m not even talking about IW10 (which is so 2010)[1]

[1]
[https://developers.google.com/speed/protocols/Increasing_TCP...](https://developers.google.com/speed/protocols/Increasing_TCPs_Initial_Window_IETF78.pdf)

------
justin66
> You should keep your firmware up-to-date to avoid painful and lengthy
> troubleshooting sessions. Try to stay recent with CPU Microcode,
> Motherboard, NICs, and SSDs firmwares.

I wonder if this is good advice. I would have said the opposite: do not mess
around with any of that stuff unless there's a security advisory or a problem
points to a specific piece of hardware. It's not like updating this stuff is
without risk.

~~~
SaveTheRbtz
> It's not like updating this stuff is without risk.

Same goes for kernel, libc, all other libraries, and pretty much anything
else. This is not an argument for not upgrading them though.

Also, sadly, almost all firmware updates fix a "problem [that] points to a
specific piece of hardware" that you use.

~~~
justin66
> Same goes for kernel, libc, all other libraries, and pretty much anything
> else. This is not an argument for not upgrading them though.

If you have a budget to responsibly keep all that stuff up to date in your
project and test to make sure you haven't broken anything, more power to you.
It's a total waste most of the time, but why not? Especially if you're a
consultant and getting paid by the hour

Firmware is different. It's not always possible to back out a change and you
risk bricking hardware every time you apply an update, especially in the realm
of PC-based servers. If something like a NIC or motherboard is performing as
expected, updating the firmware for no reason is generally a stupid thing to
do.

Edit: oh, you're the author. Please stop telling people to do this, or at
least explain the risk. At a minimum, if the vendor provides a defect list
with each update, it's not necessary to blindly apply updates. Take the
changes if they're needed. It's possible you're accustomed to working on high-
grade hardware that exhibits fewer of these firmware related blowups, but that
doesn't mean it does not happen to people...

------
z3t4
Would be cool with a story like, we did these adjustment so each server could
handle 10% more requests. etc. This blog post seems to only cover software,
you can also gain a lot of performance from hardware modding. Someone said
optimizing is the root of all evil ... So first identify bottlenecks in real
work-loads, no micro-benchmarking!

~~~
raarts
That was: ''_premature_ optimization is the root of all evil".

------
rcchen
> If you are going with AMD, EPYC has quite impressive performance.

Does this imply that Dropbox has started testing out EPYC metal?

------
nvarsj
This is really great. I love these kind of articles! I learned a few things
(like about the non-cubic tcp congestion algorithms). Probably orthogonal to
TFA, but I'd be interested to know though how they solve 0 downtime nginx
deployments.

------
pavs
No mention of VPP - does it not apply to applications? or Routing/ switching?

[https://wiki.fd.io/view/VPP/What_is_VPP%3F](https://wiki.fd.io/view/VPP/What_is_VPP%3F)

~~~
wmf
VPP has amazing performance, but it's mostly limited to switching, routing,
NAT, and VPN at this point; it doesn't have (finished) TCP and porting a Web
server like Nginx is a ways off. It's also questionable who needs something
like serving 500 Gbps of Web traffic from a single server.

------
stuxnet78
Interesting article. Well, to be honest, some of the concepts were totally new
to me and learning from this article found it interesting. And thanks for the
other links too.

------
alinspired
Great write up for traditional/kernel tuning! I guess i'm naively waiting for
dpdk-based user space solutions to appear.

------
abc_lisper
Just goes to show how much one should know in our field to make the machine
work well for you. For somebody that can understand the article, the stuff is
mostly known, but if you don't know it, the article is pretty dense.

It would be nice if someone make a docker image with all the tuning set
(except the hardware)

It would have be nicer, if the author has shown what the end result of this
optimization looks like, with numbers, comparing against a standard run-of-
the-mill nginx setup.

~~~
eugeneionesco
>It would be nice if someone make a docker image with all the tuning set
(except the hardware)

Have you not read the article?

>In this post we’ll be discussing lots of ways to tune web servers and
proxies. Please do not cargo-cult them. For the sake of the scientific method,
apply them one-by-one, measure their effect, and decide whether they are
indeed useful in your environment.

~~~
TomMarius
I don't think they meant for production, just testing and toying with it.

------
mfonda
Great post! Contents of it aside, I very much like the disclaimer:

> In this post we’ll be discussing lots of ways to tune web servers and
> proxies. Please do not cargo-cult them. For the sake of the scientific
> method, apply them one-by-one, measure their effect, and decide whether they
> are indeed useful in your environment.

Far to often I see people apply ideas from posts they've read or talks they've
seen without stopping to think whether or not it makes sense in the context
they're applying it. Always think about context, and measure to make sure it
actually works!

------
brendangregg
Excellent!

------
dragonwarrior
Very interesting

------
mozumder
So how does Linux compare now with FreeBSD in terms of throughput and latency?
I remember like 10 years ago Linux had issues with throughput, which is why
Netflix went with FreeBSD. Are they similar now?

~~~
drewg123
So, this is an honest question. What kind of performance do Linux based CDNs
get out of a single box?

At Netflix, we can serve over 90Gb/s of 100% TLS encrypted traffic using a
single-socket E5-2697A and Mellanox (or Chelsio) 100GbE NICs using software
crypto (Intel ISA-L). This is distributed across tens of thousands of
connections, and all is done in-kernel, using our "ssl sendfile". Eg, no dpdk,
no crypo accelerators.

I'm working on a tech blog about the changes we've needed to make to the
FreeBSD kernel to get this kind of performance (and working on getting some of
them in shape to upstream).

~~~
peterwwillis
It doesn't hurt to get a performance boost from your processor's crypto
instructions, assuming you optimized your cipher lists to prefer crypto with a
modern hw implementation (AES128-NI is 179% faster than RC4).

But is this traffic ongoing connections, new connections, a mix? They have
different penalties, and result in different numbers: 90Gbps of ongoing
connections might be, like, 100,000hps, but 90Gbps of new connections during
primetime might only net you 50,000hps. And are you using Google's UDP TLS
stuff?

Google also hacked on the kernel a lot to improve their performance, I don't
know if any of that's upstream currently though. Maybe Cloudflare can answer
you, as they seem to support the most HTTPS wizardry of the big CDNs.

~~~
drewg123
Yes, as I said, we're using ISA-L. This is Intel's library with thin wrappers
around hand-tuned assembly routines for crypto.

The traffic is mostly long-ish lived connections. Eg, the duration of a TV
show or movie. So there is some churn, but not a lot.

This is all TCP. By "UDP TLS", I assume you mean Quic?

------
korzun
You can skip all of that nonsense and run FreeBSD.

~~~
SaveTheRbtz
When I was at Yandex, we were a FreeBSD-CURRENT shop (FreeBSD 9 at that time).
So, that said, FreeBSD will had exactly the same[0] issues on the low/mid
levels, just not all of them had a good solution[1][2].

[0]
[https://wiki.freebsd.org/201305DevSummit/NetworkReceivePerfo...](https://wiki.freebsd.org/201305DevSummit/NetworkReceivePerformance/ComparingMutiqueueSupportLinuxvsFreeBSD)

[1]
[https://wiki.freebsd.org/NetworkPerformanceTuning](https://wiki.freebsd.org/NetworkPerformanceTuning)

[2]
[https://wiki.freebsd.org/TransportProtocols](https://wiki.freebsd.org/TransportProtocols)

------
hartator
I wish they could make a MacOSX app that doesn't use almost 100% of one core
all the time.

~~~
cabaalis
Sorry you're downvoted, but what you mention is a real concern.

~~~
ovao
Not to this topic.

------
cat199
great article overall - but starts off by saying 'do not cargo cult this' and
then proceeds to proscribe many 'mandates' without giving any rationale behind
them..

~~~
SaveTheRbtz
I've tried to:

1) give an bcc/perf example to check the need for tuning and verify its
effect.

2) give code/docs/paper reference as an embedded link.

3) give a generic monitoring guideline at the start of a "chapter".

Seems like I've (at least partially) failed. I'll do better next time.

------
Unisecure
Know these Linux backup best practices to avoid data loss.
[https://buff.ly/2gJK5Ew](https://buff.ly/2gJK5Ew)

