

Serving small static files: which server to use? - peterbe
http://nbonvin.wordpress.com/2011/03/24/serving-small-static-files-which-server-to-use/

======
unshift
i don't really see the point of these micro-benchmarking articles at all.

so what if nginx can serve a theoretically higher number of static
files/second more than something else. are you actually serving that much
traffic with no headroom in terms of extra servers and load balancers? do
microseconds of computational time per request really matter when your
outbound packets can get delayed by milliseconds in the network or dropped
completely?

there are plenty of reasons to like one server over another, but is
.0000000001 seconds/request overhead really one of them? http servers can have
wildly different behaviors regarding HTTP streaming, working models,
extensions, etc. how about the fact that varnish is a caching proxy that
doesn't really replace something like nginx, lighttpd, apache?

he's also backing varnish with a ramdisk that takes 25% of his memory (for a
100b file, no less!) when comparing it to the others. probably not the best
designed test out there.

> Again, keep in mind that this benchmark compares only the servers locally
> (no networking is involved), and therefore the results might be misleading.

i don't know why anyone would publish "misleading" benchmarks

i know it's less fun and there are no numbers involved, but what about a real
rundown of some of the subtle differences between the servers and some of
their more unique features (besides async/threaded)? that's something i would
find useful reading, but i guess it's not as easy as firing up ab.

~~~
drtse4
And in the end, the only meaningful benchmark for you is the one performed on
your deployment environment and with your application. These generic tests (if
the conditions in which they were performed are clear) can only be used to
skim through all the available options to identify those who clearly under-
perform (agreed, identify why could be even more useful than the test itself).

~~~
unshift
they're not even worth skimming.

a while back when tornado (for python) came out, there was a whole slew of
benchmarks comparing it to twisted. as a guy who uses twisted a lot, i was
interested. all the benchmarks said tornado was faster by maybe a hundred or a
couple hundred requests/sec and therefore was the superior framework.

the big question is, what difference does it make? not a single article i read
mentioned the fact that twisted has a really awesome streaming API, has great
TCP and UDP level support, has support for a ton of other protocols (and
writing your own), or is insanely useful for non-web projects. i never read
about a single feature tornado had either or why one is worth investing time
in than another.

same thing for this "benchmark". i might as well write off varnish, since it
serves fewer requests/sec than nginx, right? wrong! it's a different thing all
together -- no mention of that anywhere in the article though. the author just
says (in not so many words) it's a piece of crap compared to nginx.

~~~
qjz
If you're trying to identify and eliminate bottlenecks, benchmarks like this
are tremendously helpful. If the theoretical limit of a component exceeds the
practical limit of other resources, I know I can look elsewhere to improve
performance.

------
drtse4
Old post, as a side note, i've performed some tests with nginx and his
configuration 1-2 weeks ago on linode and the results on the smallest linode
were nearly 10-15% less than what the author report in his post (quite good
imo).

If someone with a less optimized configuration is wondering what in his test
configuration allows him to obtain those results, here is a brief recap:

1- Tests performed with ab with keepalive enabled on both the client and
server

2- open_file_cache or similar options: this enable file caching, so basically
the server is no more i/o bound

3- Furthermore, enabling tcp nodelay (that disables nagle alg, usefull when we
have small tcp responses) and disabling access logging (this depends on how
logging is implemented, if non-blocking and on a separate thread (not a
worker) disabling it doesn't improve the results) could help a bit.

Being a cpu-bound test, having the client on a separate machine would have
likely increased the results but i doubt it would have changed the performance
ratio among them, after all in every test we had the same client with the same
overhead.

~~~
justin_vanw
I don't know what open_file_cache does specifically, but if you have enough
memory to cache the file, then linux would have enough memory to cache it as
well? In that case, aren't you really overcoming being IO bound, but rather
avoiding the file opening/closing overhead?

~~~
drtse4
The open_file_cache_* options in nginx (but other server have similar options)
allows to cache in memory the file so the disk is used only when the cached
value is no more valid (timed to 30s lifetime on the linked tests). After a
quick refresh of how the buffer cache works, i'd say that you are correct and
that with the option above the server is also using a simpler data structure
to retrieve those cached pages (no blocks list to traverse, no access locks).
The difference in performance with and without this option was huge in a quick
test, enabling the server caching i saw the number of req/s increase
400%-500%.

------
robtoo
previously: <http://news.ycombinator.com/item?id=2629631>

Also, "The client as well as the web server tested are hosted on the same
computer", which is pretty poor design, to be honest.

~~~
palish
_"Doing a correct benchmark is clearly not an easy task. There are many walls
(TCP/IP stack, OS settings, the client itself, …) that may corrupt the
results, and there is always the risk to compare apples with oranges (e.g.
benchmarking the TCP/IP stack instead of the server itself)."_

By hosting the web server and the client on the same computer, he is testing
one aspect in isolation of others. This is a good thing, and is generally the
way in which scientific tests advance knowledge.

~~~
robtoo
Except he isn't just testing one aspect in isolation of others. He's actually
introducing an entirely new aspect (the client) which is a substantial load
and won't be there in production.

~~~
peterwwillis
the extra load incurred is the same across each test (same client, same args,
same tuning) thus the extra load is the same across all tests, so the results
are still valid, just not the highest performance possible.

~~~
robtoo
Complex systems have complex interactions. You can't just just hand-wave this
away by claiming that the interactions will be identical for all cases without
actually demonstrating it.

~~~
peterwwillis
Ok, so as the client processes more requests with the server the resource use
increases, so in theory the higher the benchmark numbers the faster the server
would actually respond without the extra load of the client. So (in theory)
the server with the highest performance actually performs better than
perceived (assuming that the tester is hitting resource bottlenecks somewhere
on his server during the test, which isn't shown).

Luckily this benchmark is incredibly simple. It's not a complex system as the
test is using a single set of data with two pieces of software in a single
contained environment; the only thing that changes is one piece of software
and one configuration: the server. Separate the server/client and your test is
still the same, only with extra resources for the server and client to take
advantage of (and less network bandwidth and higher latency). Knowing how http
clients work, and knowing how http servers work, is it possible that the
client or server could be utilizing resources in such a different way after
being separated as to skew the results in a significant way?

I don't believe so. Even if you saturated a 1Gbps network link, you will see
differences in CPU time between processing of requests and differences in
memory use, and unless they are all fast enough to saturate that link you will
see some servers process more requests than others. If you want to verify this
you can follow the benchmark's set-up and try on two separate machines and let
us know if there's a significant difference.

------
joss82
Look at those memory usage graphs, does this means those servers are leaking
memory?

With the notable exception of nginx, of course.

~~~
Luyt
He writes:

 _"Regarding the resources used by each server, Nginx is the winner in term of
memory usage, as the amount of memory does not increases with the number of
concurrent clients."_

So I guess the memory consumption is caused by the number of concurrent
clients, and not by a memory leak.

~~~
joss82
You are right. Then it would be interesting to see how much of the memory is
freed after all the clients disconnected.

------
justin_vanw
I think benchmarks like this are very harmful. How many small static files you
can serve per second is just one (not very important) criteria when choosing
one of these servers.

I think more important criteria are:

1\. Stability. How often are you woken up in the middle of the night because
your web server is shitting the bed.

2\. Configuration. Can you configure it to do all the things you will need it
to do? Have others who have come before you been happy with it throughout the
entire life of their product, or have they outgrown it?

3\. Simplicity. Can you set it up to run efficiently without weeks of study on
how this server is properly deployed? Is it easy to mess up the configuration
and take your site down when making a change?

4\. Generality. Are you going to need something else to sit in front of your
dynamic pages, if you require them? This is also a factor in stability, if you
have 2 server solutions, all else being held constant, that is twice as likely
to break down or get broken during a configuration change as just one.
Actually, it is much more than twice as likely, since you are spreading your
competency to learn the ins and outs of 2 pieces of software, so you are less
capable on each than you would have been if you just had one server solution
to worry about.

So, given all this, my advice to anyone trying to make an initial decision on
what webserver to use is: (Apache|nginx) (pick one only) should be your
default until you believe you have a compelling _reason_ to use something
else. Both are capable of doing more or less everything you need, have lots of
extensions, are widely used, and have comprehensible configuration. Once you
have mastered whatever one you use, you will be able to tune it, debug
performance problems, and spend the minimum possible amount of time doing
server configuration and testing, and maximum time implementing features and
supporting customers.

------
qjz
I've tested and admired the performance of G-WAN, but the closed-source nature
of the project may be a bit of a showstopper for some. Development appears to
be narrowed to Debian derivatives, making successful installation of the
binary on other Linux/UNIX platforms challenging. It would be nice to be able
to inspect and modify the source in order to optimize and compile it for the
desired platform.

------
mseebach
These results are valid if your static content all fit in memory. I would
expect interesting diversions in performance if a certain proportion of
requests would have to hit the file system.

Also an interesting piece of noise missing is slow clients holding on to the
connection. If you're serving up multi-megabyte files, I would guess this
could become a major factor.

~~~
peterwwillis
part of why you use caching servers or proxies is to avoid hitting the
filesystem, since it's very costly compared to memory (or even network to an
extent). the ramdisk in this case is performing the function of the caching
server but at much increased speed and much reduced overhead.

afaik the only way you could get over a couple thousand RPS while reading from
the filesystem is due to inherent caching of the VFS and disk buffers by the
operating system.

and in an asynchronous web server, connections should not be held up by slow
clients. if they use a model where one thread or process handles each client
connection you could definitely get starvation of resources or connections,
but asynchronous models should just process stuff as it comes in and not
"wait" on a slow client.

------
mckoss
Off topic, but WordPress has made blogs unreadable for iPad users. I can't
even scroll through this article without the screen jumping erratically past
many pages of content.

