

C500k in Action at Urban Airship - cscotta
http://blog.urbanairship.com/blog/2010/08/24/c500k-in-action-at-urban-airship/

======
tmountain
I'm curious what kind of OS tuning was required for 500k concurrent
connections. It's obvious that some ulimit tuning is necessary, but I'm
wondering if any other hard limits beyond that were hit? Also, in case the
original author is reading, you might find this interesting as well. It's an
article on trying to reach 1MM concurrent connections with Erlang.

[http://www.metabrew.com/article/a-million-user-comet-
applica...](http://www.metabrew.com/article/a-million-user-comet-application-
with-mochiweb-part-1)

~~~
superjared
We modified /etc/sysctl.conf to have really small buffers per socke, modified
ulimit to allow for 999999 fds, and modified
/proc/sys/net/ipv4/ip_local_port_range to max out ephemeral ports per
connection.

------
dayjah
Thanks for doing this, definitely very interested in the follow up posts.
Having just finished working for a company that 'claim' to be experts in this
area it is very interesting to see you guys wipe the floor with your
connections/node ratio.

------
Sukotto

      Jumping from 35,000 connections per node on an EC2 
      Small instance to over half a million on a single 
      EC2 Large...
    

You moved the goalpost. I think the conclusion would make a lot more sense if
you compared apples to apples when it comes to your server... either test
everything on a small or on a large.

~~~
cscotta
Apologies for not publishing the EC2 Small number in the original post. It was
120,000 for the pure Java NIO implementation, but I'd like to explain a bit
more on that one.

We hit 120k clients at the time the process was killed by the kernel, but had
about 500mb free memory left when the OOM kill occurred. Here's why (and
please correct me if I'm mistaken - I'm not a kernel expert, but have done a
lot of reading in this area over the past couple months):

Small EC2 instances are restricted to running a 32-bit kernel. The 32-bit
Linux kernel (2.6 series) allocates memory into three zones, with the first
and smallest slot reserved for DMA (~16mb), the next portion reserved for
kernel functions ("low memory", ~896mb), and the rest allocated as "high
memory" for userspace. There are very limited options to configure this
allocation, aside from recompiling and maintaining a custom kernel for our
purposes, which we do not want to do. The Hugemem kernel allocates low memory
differently, but is no longer being actively recommended (that I can see), and
only makes sense for servers with > 4GB anyway.

Because TCP sockets are allocated in low memory, and its size is relatively
fixed, 120k sockets will exhaust low memory despite about 500mb of high memory
being free and unallocated. At this point, the kernel has no memory left
allocated to itself to do work, so the OOM killer steps in and shoots the
process.

The 64-bit kernel makes no distinction between "low" and "high" memory, and
does not suffer this problem. We switched to an EC2 Large instance in order to
avoid this limitation and properly test the service. In the end, if Amazon
were to offer a 64-bit Small instance, it would be ideal for our application
and would push our cost per client even lower, though it's currently at a very
acceptable spot.

While the metrics in this post focus on connections per node, our ultimate
goal involves both maximizing the number of connections per instance, in
addition to minimizing the number of instances required (essentially, we want
to [safely] maximize density across the board). The sentence you quote refers
to this cumulative goal rather than the particular comparison of
implementations and instance types. I should've been more clear.

Anyhow, pardon this omission from the original post. At some point, I might
write a bit more about the TCP/kernel-level issues we ran into if anyone's
interested.

~~~
amalcon
It would be interesting to know what the "slow" software looked like on the
big instances, then. Still, a "large" instance is not fifty times the size a
"small" instance, so it looks like you're easily coming out ahead.

------
dschobel
Can someone clarify what a spike is? It doesn't appear to refer to just
resource load...

 _"...we spent several hours fanning out that spike to include three
versions"_

~~~
robotadam
It's a spike as in a spike solution -- an end-to-end prototype to see if a
chosen method of solving the problem is feasible, but without writing tests or
handling all error cases or the like.

<http://c2.com/xp/SpikeSolution.html>

~~~
dschobel
thank you.

------
bad_user
Why the difference when working with code written in Scala?

What does Scala do to make the number of connections 50% less?

------
brandon
I'd be particularly interested to know what the JVM spikes were doing with
their respective connections. I'd love to profile some roughly-equivalent
implementations using Erlang/Twisted/libevent/et al.

FOR SCIENCE

------
tomjen3
Interesting, but I wonder why they didn't take a look at Erlang - I mean this
is pretty much what it was meant to do.

~~~
WALoeIII
Perhaps they did not have a cool enough accent? 'EARRRRRLang'

 _Kidding_ \- I would be very interested in more on why Scala failed as well.
Both Scala and Erlang are implementations of the Actor model and should
theoretically be very well suited to this.

~~~
schmichael
When trying to support the maximum number of connections per box the model
with the best conceptual fit doesn't necessarily have the best functional
characteristics. Actor models can have much higher per-connection resource
usage. Message passing isn't free, and wrapping each connection in an
Actor/Greenthread structure isn't free either.

At the end of the day storing sockets in a hashmap is a pretty compact data
structure. _If_ you can minimize synchronization/locking, threads shared state
is extremely efficient from a CPU and memory standpoint. Maybe not so
efficient from a developers time/sanity standpoint though. :-)

Do note that this article is purely about an edge server though. Its job is
essentially to hold sockets and communicate with internal queues. The message
passing model is alive and well, albeit at a higher level.

------
russell_h
Someone mentioned this in the comments, but I'm curious about the outcome of
the Node.js spike as well.

------
niallsmart
Does keeping a socket open affect the handset battery life much?

~~~
mtrichardson
Not in our tests, and nobody in our beta test period had any comments. The
logic around controlling the connection is pretty intelligent in what it does.

A connection that's open isn't problematic at all, assuming it's being used -
lots of setups/teardowns can be an issue (which is why polling apps will
consistently take up a large portion of your battery life). The main issue is
if the app is using a radio when it shouldn't be. Luckily, we can control this
pretty well with our handling code and have been able to balance out needs
pretty well between delivering a message as quickly as possible and not
destroying battery life.

If you're an Android dev, we'd love your feedback on it in general :)

