
Boosting Nginx Performance with Thread Pools - gregham
http://nginx.com/blog/thread-pools-boost-performance-9x/
======
smegel
I think this would have been much better titled "Boosting NGINX Performance 9x
with Asynchronous wrappers around blocking system calls".

Most people when hearing about "thread pools" in the context of a web-server
think about using multiple threads for handling separate requests, which is
NOT what this is about. It is using threads to enable some blocking syscalls
(read and sendfile) to run asynchronously from the main event loop.

There's already a library for that!
[http://software.schmorp.de/pkg/libeio.html](http://software.schmorp.de/pkg/libeio.html)

~~~
misframer
Similarly, there's libuv [0], which is used for Node.js and others.

[0] [https://github.com/libuv/libuv](https://github.com/libuv/libuv)

~~~
rgbrenner
libuv is for C projects that miss the joy of javascript callback hell.

~~~
anacrolix
Or don't want to have 10000 threads. I have a Go server that regularly bumps
140k goroutines. Try that shit with native threads.

~~~
rgbrenner
libuv isn't unique.. It's equivalent to libev + libeio.. in fact, that's what
nodejs used before writing libuv. Whether or not it's faster than those is
really case-by-case.. but what you'll definitely get with libuv is callbacks
everywhere.

------
pacquiao882
That Flare Mobile crap on this site is constantly applying zoom CSS attributes
and vertically centering the useless sidebar share buttons making the reading
experience very horrible. The page freezes momentarily anytime a scroll event
is triggered. And, I'm not even using a mobile device!

A bit ironic since this is an article about reducing blocking for improving
performance.

~~~
jasonatdt
Hi there - I work on Flare. Could you let me know what phone/OS you're using,
so I can take a quick look? jason [at] filament (dot) io. Thanks very much!

------
mdasen
The design seems similar to the one suggested in the 1999 USENIX paper "Flash:
An Efficient and Portable Web Server". It's a good read on the topic. nginx
came about in a time where you're a lot more likely to have your site cached
in RAM than in 1999 (along with offloading large files to a CDN/S3 and
reverse-proxying to an app server for a lot of other stuff), but it's nice to
see them working on making performance better for the bad cases.

[https://www.usenix.org/legacy/event/usenix99/full_papers/pai...](https://www.usenix.org/legacy/event/usenix99/full_papers/pai/pai.pdf)

------
antonios
"On the other hand, users of FreeBSD don’t need to worry at all. FreeBSD
already has a sufficiently good asynchronous interface for reading files,
which you should use instead of thread pools."

Great to hear that.

~~~
simula67
What if Linux had a great asynchronous interface for reading files and FreeBSD
had a terrible one ? Would NGINX team have bothered to implement this ?

~~~
justincormack
Originally, as Igor has said in many talks, Nginx was written for FreeBSD, and
supports what FreeBSD supports, and the Linux port has historically managed as
it could. This is a case of actually adding something for Linux specifically,
which is unuusual.

So the answer is it probably would have been implmented years ago if that was
the case.

~~~
ArmTank
Originally. But the focus has shifted since then. There are a number of Linux-
addicted developers in the team now.

------
caf
A slight note on the terminology - reads and writes of ordinary disk files
technically do not "block"; they "Disk Wait" instead. The difference is
visible for example in that ordinary files are always considered by
select()/poll() as readable and writeable.

------
istvan__
I don't think that a load of 172 is a good idea. I know this is a benchmark
that is measuring how fast you can go ideally but in production the question
is how fast you can go with with keeping the latency within the SLA. As a
general rule you want to run your boxes around 1 normalized load ( load / # of
CPU cores). The rest of the article is pretty nice.

~~~
caf
Remember that processes in (D)isk Wait state count towards the load average
even though they are not running or runnable.

~~~
istvan__
As well as network IO. This is why just by the load you can't tell what is
going on and this is exactly the reason why I don't like it too much in
production. A box should do a have a smooth 15minutes normalized load over
time. (If your workload changes you can do autoscaling, I think we used the
normalized load as the metric for scaling up and down).

~~~
noselasd
At least according to the linux documentation, network IO is NOT part of the
load average, only processes waiting for disk IO and runnable processes.

------
tracker1
I think it would be cool to see something similar from MS about IIS and .Net
which have used thread pools for some time, though only relatively recently
has asynchronous development taken hold at the application level (beyond
lifecycle events)...

In practice, I've seen plenty of errant bugs because of race conditions in
sites that start to come under heavy load. I wish more people would take the
time to understand how their platforms work. That said, I've really come to
appreciate the node.js approach.

------
SethMurphy
Great explanation of an event driven web server. Helped me understand ome of
the benefits of the mongrel2 architecture that separates the tasks needed to
be done completely by using ØMQ as the mechanism to decouple the connection
handling from the message handling of the request.

~~~
dschiptsov
While asynchronous messaging is generally a good idea, using a messaging
middleware seem to be an overkill. One should use smallest hammer for the job.

Thread is a wrong idea in the first place - by broking isolation of processes
(share nothing principle) they brought in the whole new class of problems with
locking and synchronization. Only threads that share nothing is a reasonable
choice, but without sharing the whole concept makes no sense anymore. So there
are kernel lightweight processes which seems to be good choice for offloading
the blocking operations from the main loop.

BTW, Erlang does it right from the very beginning.)

~~~
vidarh
0MQ _isn 't_ a messaging middleware. It's point to point (unless you build
your own middleware).

------
ehmuidifici
'[https://bugs.launchpad.net/ubuntu/+source/nginx/+bug/890179](https://bugs.launchpad.net/ubuntu/+source/nginx/+bug/890179)

------
IgorPartola
It is really unfortunate that Linux does not do proper async disk IO. Then
again, for lots of websites, the static assets stores on disk fit in the OS
cache so the boost won't really be nearly as big.

~~~
uxcn
I'm not sure what you really mean by this. Linux has supported non-blocking
I/O using _select_ and _poll_ since at least 2.4. 2.6 even added support for
_epoll_ , which scales even further since the callbacks are _O(1)_.

It's a fairly common practice to spawn _2n_ processes/threads ( _n_
processors) to allow half to block on I/O and system calls though.

~~~
IgorPartola
Well, TFA talks about Linux not having great support for async IO for the
filesystem. You can use O_DIRECT and get async IO that way, but that
completely bypasses the OS cache, so it's not a great way to do it, at least
not for nginx. Just read the article to see the details.

Note that kqueue(2) in BSD-land supports a unified interface for async IO for
both sockets and files, so you can have a proper event loop without having to
resort to reading files in a thread pool. If Linux had something similar, a
nginx wouldn't need to integrate a threadpool for this (though it might for
other things, such as CPU-intensive plugins).

~~~
uxcn
The nginx threadpools aren't strictly for I/O. One of the other major issues
TFA mentions is that plugins don't use _epoll_ / _kqueue_ , and they block
(with all the associated performance costs).

The detail I apparently skipped is that uncached file reads aren't handled
uniformly through _epoll_ (which I'm surprised about). I don't see why files
should be handled any differently than sockets. etc... in regard to non-
blocking I/O using _epoll_.

Although, my issue is that everyone tends to look to methods starting with
_aio__ to do asynchronous I/O. Those are fairly bad interfaces (POSIX AIO) and
inefficient (effectively threadpools). Using the nginx model with an _epoll_ /
_kqueue_ event loop is a better architecture.

------
nly
Hardly exciting stuff, async libraries have been doing this for things like
DNS queries (where there's no portable non-blocking API) for decades. Good for
Nginx addon devs I guess.

~~~
girvo
Just because it's been done in other tools doesn't make it "hardly exciting"
when an often-used tool that was lacking a feature adds it.

~~~
coldtea
He conflated exciting with "new development in computer science".

------
dschiptsov
Linux has POSIX aio syscalls which seems to work. At least Informix and Oracle
rely on them.

~~~
phs2501
Linux kernel aio will often still block when dealing with the page cache even
if you request nonblocking. The workaround for this is to use O_DIRECT, which
is okay for databases that do their own cache management but not for something
like nginx (which is depending on the OS cache).

Glibc's posix aio (aio_*(3)), on the other hand, does not use Linux's kernel
aio AFAIK. It probably uses thread pools. It also uses signals to signal
completion. It is not generally considered performant.

~~~
dschiptsov
Yes, good points about caching, Informix does everything by itself, indeed, on
raw devices or direct mapped files, which is the only way to maintain not
evenrual, but strong data consistency Thanks for clarifying.

------
farqueue
403 Forbidden

Well at least it isn't a 503

~~~
kolev
They deleted it for some reason, but here it is from Google Cache:
[https://webcache.googleusercontent.com/search?q=cache:http:/...](https://webcache.googleusercontent.com/search?q=cache:http://nginx.com/blog/thread-
pools-boost-performance-9x/)

------
kreutzwj
Thanks, we run a few sites at work that might benefit from implementing this.

------
skorgu
If only the nginx-provided rpm build was built --with-threads.

------
culo
Open-source API Management KONG
([https://github.com/mashape/kong](https://github.com/mashape/kong)), which is
based on NGINX, uses the same workaround of Async wrapppers to make requests
faster.

