
Scaling node.js to 100k concurrent connections - dworrad
http://blog.caustik.com/2012/04/08/scaling-node-js-to-100k-concurrent-connections/
======
ericz
Before everyone gets excited about these big numbers, I would like to remind
you that even higher concurrency can be achieved with even lower CPU and
memory usage using Erlang. These numbers are good for Node, but don't use this
as evidence that Node is magical and much better at handling large numbers of
connections than other systems.

~~~
jimparkins
People are excited by Node doing numbers like this because there is a massive
active Javascript community - with hundreds of thousands of people using
Javascript all day every day at work. 0.01% of these people would ever
consider learning Erlang, and even if they did they would not be able to use
it at work - ever. As with everything having better features means nothing if
nobody adopts. I am not saying nobody uses Erlang and I am not saying people
are not adopting it - but the number are just not comparable to the Javascript
community . Lastly just because you know a but of Javascript I realise that
this does not mean you can architect massive real time systems. But it is like
WOW even people that play casually aspire to having the best kit or playing
for a top guild.

~~~
mediocregopher
>As with everything having better features means nothing if nobody adopts

Well, it means my product is going to be superior since I went the better, if
less well known, architecture. It's not like erlang is a little unsupported
side-project of a language, it's actually older then javascript if you count
the time period before it was open-sourced, and only a few years younger if
you don't, and is used extensively by many industries.

Also, just because javascript the language is more well-known doesn't mean
javascript the server architecture is more well-known. I would argue it isn't;
when people want a highly concurrent, solid server, erlang is always
mentioned.

Lastly, erlang is a pretty easy language to learn. I had the basics down in a
day, I had a prototype pubsub server that could handle 50k connections in two.
The syntax is a bit strange, and honestly it does get in the way sometimes,
but it's not hard.

~~~
maigret
You're missing the point of Node. You can construct a web rendering
application and its AJAX parts in one codebase. You can move fastly code
between server and client rendering. etc etc etc.

Probably Erlang is "better". BTW, Java & C are quite fast also. Java
applications, well written, do scale. If they don't, they are folks out there
specialized to make them scale.

Also, Erlang is probably easy to learn as language. But when you develop a web
app, you have enough other skills to keep up with. Let's name CSS for one ;)
The human brain is limited in its capacity to remember API and language
specifics.

Also, more popular means more libraries, which makes the product in turn
better. This is why so many folks turn to PHP. It's not elegant, but
everything you need is already here.

Now I won't argue that you may have good reasons to use Erlang for yourself,
may it be because you like the language structure, like to write libraries by
yourself, or so on. But it doesn't make it "superior", foremost not as a
platform.

~~~
ericmoritz
So your argument is that web developers are too stupid to remember Erlang.
Tell that to all those Django developers that have to juggle Python, HTML, CSS
and Javascript! They must be superheros! Ruby on Rails developers must be as
well!

~~~
maigret
I never told they are stupid. Rather, I think a simpler environment enables
more productivity for the developer. Assembler is hard, some people master it
in incredible ways. Does that mean that C is useless? No. Node goes the way of
unifying the web stack around JavaScript, I find it a least interesting. The
future will tell the rest.

------
forgotAgain
Garbage collection is disabled. How is this then relevant to any real world
usage?

~~~
sootzoo
He's not running with GC permanently disabled, he's only disabled the
automatic GC because of the huge overhead required (claiming 1-second pauses
every few seconds). He also mentions it's trivial to enable manual GC and run
that via setInterval/setTimeout/what-have-you.

------
gaius
Isn't this really scaling the underlying C runtime to 100k connections?

------
babuskov
I use Node in production. The main thing I like about it is that looking at
system usage graphs while number of users grow, only thing that is going UP is
bandwidth ;)

I'd really like to see a story of someone really having 100k connected
browsers. My online game currently peaks at about 1000 concurrent connections,
and node process rarely lasts longer than 2 hours before it crashes. Of
course, using a db like Redis to keep users sessions makes the problem almost
invisible to users, as restart is instantaneous. I'm using socket.io, express,
crypto module, etc.

I'd really like to see real figures for node process uptime from someone
having 5000+ concurrent connections.

~~~
giulianob
I'm using C# for my game Tribal Hero (www.tribalhero.com). It's still in early
beta so I've only had 450 concurrent users . Our CPU usage and memory usage
barely moved from 0 to 450 users. We're using socket selects and not even
async sockets which would have even better performance. It's also backed by
MySQL though we want to eventually move to Redis. Why is Node breaking at 1k
connections? Doesn't seem like much at all.

~~~
babuskov
I also use MySQL as backlog, it's practically write-only as I keep the whole
state in javascript objects. Only time when data is read from MySQL is at
program startup. However, having SQL database enables me to run various
complex SQL queries for reporting.

However, I do use Redis for one thing: user sessions. I turned persistence off
as Redis seems to be rock-stable, and I really don't need sessions to persist.
I was using a modified version of Node's MemoryStore, to which I added clean
garbage collection, but with often restarts I mentioned earlier it has become
pain for users to have to login again when in the middle of the game. Having a
separate, dedicated Redis instance to handle the sessions made restarts
completely seamless, as the cookie sent by user's browser remains valid
between node restarts.

I was not willing to learn new db technology, but there wasn't really much to
learn with Redis. You can set it up in minutes and it just works(tm). I highly
recommend you try it.

~~~
giulianob
I'll be switching over the entire game state to Redis. It'll be a bit of work
but what I like the most is that it maps more naturally to objects. My db is
mainly writes as well.

------
antihero
Can uwsgi/nginx be configured similarly?

Is it common practise to have node face the web without nginx?

------
decad
Link to his next post showing him breaking 250k -
[http://blog.caustik.com/2012/04/10/node-
js-w250k-concurrent-...](http://blog.caustik.com/2012/04/10/node-
js-w250k-concurrent-connections/)

------
devmach
It's a shame, that he didn't mentioned about kernel tuning. Without custom
settings ( like net.ipv4.tcp_mem ), i think, it's a very difficult to reach
this numbers.

------
nivertech
I did 3M/node on physical severs, 800K/node on EC2 instances.

We mostly use Erlang on server-side and node.js + CoffeScript on client-side
(where they rightfully belong ;)

------
nicolast
It struck me the author runs his apps as root (in screenshots). But then I
remembered he's using node.js to handle "thousands of concurrent connections".

~~~
xentronium
I think it's his testing machine, so it's shoot & forget setup.

------
dotborg2
Looks like author is not aware of some cuncurrency problems, deadlocks etc.
Backend/database might not scale to 100k concurrent connections so easily.

~~~
benologist
This is where NodeJS _really_ starts to shine - persistant connections and
background operations let you do a whole bunch of cool stuff to mitigate that.

In my case I have entire db tables and collections replicated in memory and
kept in sync via redis pubsub, and the 100,000s of concurrent users I have are
all sharing just a few dozen persistant redis and mongodb connections between
them.

------
ericmoritz
I would really love to know what he did to tune that Rackspace VM. I had a
terrible time trying to get node.js and others to get past 5,000 concurrent
websocket connections on a m1.large EC2 instance or on Rackspace.

------
mariuz
I wonder what happens at 100k database connections , i will give a try with
firebird and the nodejs driver

~~~
bradleyland
That's the thing about these types of benchmarks. They're useful for showing
that node has the throughput -- at a low level -- to serve a huge number of
concurrent connections, but it doesn't translate directly to huge application
throughput if you're relying on things like database access over a network. In
practice, each of these problems must be solved individually.

I don't mean to minimize this accomplishment. If you're assuming you need 100k
database connections in order to scale, you might be solving the wrong
problem. Scaling is a matter of moving data as close to the CPU as possible.
This means in-memory caching is where real performance comes in. I don't care
how good your language/framework is, you can't defeat the physics of slow I/O
over a network.

------
bluesmoon
i remember seeing this on HN back in April

