
600K concurrent HTTP connections with Clojure and http-kit - nashequilibrium
http://shenfeng.me/600k-concurrent-connection-http-kit.html
======
danielrhodes
Add an endpoint with a poorly optimized SQL query: 20 concurrent connections.

I find these benchmarks to not be very valuable. Typically the bottlenecks are
in the database, not in the web server/code. Even when it is in the code,
optimizations only become truly valuable when you are at scale or doing
something either horribly wrong or algorithmically unique.

~~~
Legion
> I find these benchmarks to not be very valuable.

It's interesting to take one piece of the stack and see how much abuse it can
take, even if it's not necessarily reflective of real world behavior.

~~~
khet
I am afraid such out of context benchmarks will do more harm than good.

~~~
xt
I like what agentzh has done with lua + nginx. But his benchmarks are also
showcasing how it works compared to other popular solutions. Check it out at
<http://agentzh.org/misc/slides/libdrizzle-lua-nginx.pdf>

------
mbell
Maybe I'm not understanding what is going on here but it appears the author's
client is communicating over a socket on the same machine as the server. The
author is seeing insanely high numbers because she/he is bypassing the entire
TCP/IP stack.

~~~
javajosh
Running locally doesn't mean you bypass TCP/IP.

~~~
mbell
While true, I'm suspect given this setup that some layer isn't just dropping
to a unix socket when it notices localhost.

Most notably, 600k active real TCP connections in the kernel would use
somewhere around 6GB of memory assuming an average of ~10kB of memory per
socket for the R/W buffers and other data. EDIT: Thats just the server side,
double to include the client sockets.

Most attempts I've see at this number of real TCP connections required a lot
more tweaking of kernel TCP settings to achieve.

~~~
alexkus
For large number (1M+) of connections (doing websockets or long polling) to
replicate the functionality of a COMET server I've been playing with rolling
my own TCP handling via libnetfilter_queue as I simple don't need the ~10kB of
r/w buffers on each socket.

Linux (as far as I'm aware) doesn't allow tuning of r/w buffer sizes on a per
interface basis otherwise I'd have one interface for the COMET server with
drastically reduced r/w buffer sizes and the remaining interfaces with
'normal' TCP r/w buffer sizes to ensure the other things running on that host
run without problem.

------
shenedu
Hey, http-kit's author here. willing to answer any questions.

~~~
cgag
What's different about http-kit vs say netty that allows it to handle so many
connections / be so fast?

~~~
shenedu
Linux's epoll or FreeBSD's Kqueue is ridiculous scalable. Both http-kit and
netty take advantage of it. http-kit is all about HTTP, netty is a general
framework.

------
shenedu
Author here. The server is open source, on github: <https://github.com/http-
kit/http-kit> Test test code is also on github: <https://github.com/http-
kit/scale-clojure-web-app>

Suggestions very welcome!

------
stiff
The server itself seems to be written in Java: <https://github.com/http-
kit/http-kit> I wonder what the factors were for not writing it in Clojure if
Clojure is the target platform.

~~~
shenedu
author here: Java part: 1\. Java NIO's performance is amazing: event driven
r/w bytes 2\. maintain state machine when parsing HTTP from bytes buffer need
many local variables, I get used to do it in a C style code.

Clojure: I like this language. It's brilliant. So I write a fast http
server/client for it. Clojure is also write in JAVA, great interoperation

~~~
nqzero
can the http-kit server be used from java directly ? last i looked, all the
async java webservers seemed limited to 1000s of connections, which largely
defeats the purpose. i'm willing to give up the servlet spec if needed

~~~
shenedu
Hey, http-kit can be used to from java directly: [https://github.com/http-
kit/http-kit/blob/master/test/java/o...](https://github.com/http-kit/http-
kit/blob/master/test/java/org/httpkit/server/MultiThreadHttpServerTest.java)

Not recommended! The API is very lowlevel.

Maybe you can try to tweak the max allowed open file to a larger value. The
default is about 1024, that how you just get 1000(I guess). Jetty is quite
good at concurrency, you can double check it.

Why not try Clojure? Your web dev's productivity increase by few times
instantly.

~~~
nqzero
looks good. my application (database-ish) is in java and i haven't worked on
any bindings yet. just trying to put together a demo that shows off the
concurrency

------
zwischenzug
Similar article here, going to 11:

[http://www.metabrew.com/article/a-million-user-comet-
applica...](http://www.metabrew.com/article/a-million-user-comet-application-
with-mochiweb-part-1)

~~~
jadc
\+ 1 for the C1M comet application

The article provides a lot more in-depth information on the subject and more
experimental data (across multiple machines - not only localhost).

------
ptaoussanis
Feng (http-kit's author) should be around shortly if anyone has any questions.

In the meantime, you can also check out <http://http-kit.org> for more info.
That page is a work-in-progress so please excuse any errors.

We were actually planning to post to HN later this week; seems someone beat us
to the punch :-)

~~~
egeozcan
Don't get me wrong, I really like what you are doing but there are a lot of
spelling and grammar errors in your web site. I'm not a native speaker (I also
make a lot of errors) but maybe you can get some help from one? Not a big deal
but just wanted to let you know. Makes sense if you want to get popular =)

I also have a question: Do you plan to make any comparison tests with
Compojure (<https://github.com/weavejester/compojure>) and lib-noir?

~~~
ptaoussanis
The library author is Chinese, so English isn't his native language. We'll be
cleaning up typos soon - this post caught us by surprise.

As for Compojure, etc.- those are libraries that operate _on top of_ a Ring
web server. Jetty is the default, http-kit is a drop-in replacement. So
basically, you'd use _both_ http-kit and whatever other libraries you normally
would (like Compojure).

I swapped out a production Jetty+Compojure app to use http-kit+Compojure by
changing ~20 lines of code.

Hope that makes sense?

~~~
egeozcan
Yes it does, thanks for the clarification.

------
billiob
This reminds me of [http://blog.whatsapp.com/index.php/2012/01/1-million-is-
so-2...](http://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2011/)
where they handled 2 millions concurrent tcp connections.

------
dotborg
Once You give a little bit more of real code into those requests your JVM will
die from continous GC. It's simple math :)

------
z3phyr
Anybody heard about joxa? Think about erlang beam features on a clojure with
scheme like simplicity.

------
jlward4th
I've created the same test app with Play 2.1 RC2 and Scala:
<https://github.com/jamesward/scale-play-web-app>

On my laptop I have half as many cores as the poster and am getting about half
the performance.

------
leoh
This looks really cool. But can Apache and other frameworks accomplish this?

~~~
rorrr
1) Notice that in his tests 97% of these connections don't do anything, just
idle. He maxes out at 18764 req/sec. If you google around, Apache and Nginx
can do more than that on a beefy server.

2) Notice that they are "keep-alived", coming from the same IP, so not truly
separate connections.

3) Keep in mind that 600K concurrent connections cannot possibly do anything
useful at the same time for many reasons (CPU, bandwidth, server I/O), so they
are not truly concurrent.

4) Max concurrent connections are also limited by the OS, and that limit is
much much lower than 600K by default:

[http://serverfault.com/questions/10852/what-limits-the-
maxim...](http://serverfault.com/questions/10852/what-limits-the-maximum-
number-of-connections-on-a-linux-server)

~~~
shenedu
author here.

> Notice that in his tests 97% of these connections don't do anything, just
> idle. He maxes out at 18764 req/sec.

Yes, just testing how many concurrent connection can be held. When the 600k
are held, ab confirms that it can do about 31405.53 per seconds, the http body
is 1024bytes.

> Notice that they are "keep-alived", coming from the same IP, so not truly
> separate connections

Not from the same ip, from many ips: 192.168.1.200~230

> Keep in mind that 600K concurrent connections cannot possibly do anything
> useful at the same time for many reasons (CPU, bandwidth, server I/O), so
> they are not truly concurren

They send a request every 5s~30s to server, and wait for response

~~~
rorrr
> _Not from the same ip, from many ips: 192.168.1.200~230_

So from 31 IPs, which can be done with 31 keepalive connections.

Try hitting your server with even 50K real connections and see how long it
lasts (if it lasts at all).

> _They send a request every 5s~30s to server, and wait for response_

Exactly. ALL of them don't do anything concurrently, they just sit idly.

~~~
alexkus
You seem to be missing the point of the scenario they're testing.

Lots of idle connections (doing overlapping long polling) is exactly how many
COMET servers work.

We send ~60 "events" via our COMET server (APE from www.ape-project.org) in a
typical 2 hour period.

The server side work to decide when/what to send the clients is easy because
it's the same information that gets sent whether there is 1 connection or
1,000,000.

The fact they're from just 31 different IP addresses isn't relevant. They're
still individual connections from clients to the end server.

~~~
rorrr
> _The fact they're from just 31 different IP addresses isn't relevant.
> They're still individual connections from clients to the end server._

That's where you are wrong. Not only they are keepalive connections, they are
completely local. Do it over an actual network from 50K different IPs and see
how that performs.

~~~
alexkus
Again you're missing the point. Just checking 31 sockets for data is much much
much less work than checking 600k sockets, even if they are all via local IPs.

I agree that a connection from a local IP is not as much work for the kernel
as from a remote IP, but it's the same amount of work for the server portion
of the software to service each of the connections whether they are local or
remote. Remember too that the host machine is running both the server and the
process generating the client load. Generating the client traffic will be
costlier than what is saved by the local traffic not traversing the full
stack.

Yes, ideally two machines (one with a whole bunch of virtual IPs to fake the
clients, and the other hosting the server) would be a better test, that way
the machine hosting the server is going via the full network stack.

> Do it over an actual network from 50K different IPs and see how that
> performs.

And I don't see what difference (as far as how the networking performance of
the server will vary) of having unique IPs or not will make. Incoming
connections (from a real network) are going to cause the same amount of work
regardless of the remote IP (assuming there are no DNS lookups); and iptables
or firewall stuff should have minimal impact even if you spray a huge number
of unique IPs at it.

For my testing of a similar scenario I use a couple of old blade servers (2
chassis of 24 PIII 700MHz blades each) to generate the load. Each blade has a
unique IP and for 500,000 connections per blade I need 1M sockets (each
connection can have two open concurrently as they overlap) = 41666 sockets per
blade; that fits with a tweak to the ephemeral port range.

My server keeps long polling connections for ~25 seconds. The total network
cost of each poll is ~800 bytes[1] (TCP connection initiation, HTTP request,
HTTP response, TCP teardown). 500,000 polls every 25 seconds = 20,000/sec.

20,000 conns/sec * 800 bytes = 16,000,000 bytes/sec = 128,000,000 bits/sec.

Luckily each blade chassis has 3 x 100Mbps ethernet ports (Gigabit would have
been nice but these are old blade servers) on separate backplanes (public,
private, mgmt) so I split the 24 blades up with 8 on each interface to keep
well below the 100Mbps limit of each port.

1\. Which is why Websockets is much more efficient, roll on adoption in the
popular browsers (not just the few who run relatively recent installs of
Chrome/Firefox).

------
dschiptsov
What is memory usage per connection? How much unnecessary data copying is
going on? What is latency?

What happens under the real load of, say, a thousand concurrent TCP
connections _together_ with few thousand of back-end/other data-sources
pending calls?

What will happen to memory usage and latency when simple setup above will
serve simplest _remote_ requests (which means stalled connections, re-
tranmitions, etc) for 8 hours? 24 hours?

~~~
shenedu
http-kit needs a few kilobytes of memory (buffer for parsing HTTP request,
maintain state, etc) for a connection.

The thread model used by http-kit: a dedicated thread(server-loop), only doing
events IO and parsing, when done, queued the request for a thread poll to take
it, thread pool compute the response, queue it for the server-loop thread to
write it back to client.

Since epoll and kqueue's readiness selection is O(1), idle connections does
not hurt latency at all.

