
Node and Scaling in the Small vs Scaling in the Large - ssclafani
http://al3x.net/2010/07/27/node.html
======
blasdel
He mostly pans the _'less-than-expert programmers'_ canard, so he never
explores the assertion that's at its base — that events are far better for
_correctness_.

Rob Pike has spent the last 25 years developing evented languages, and while
the Hoare-style CSP approach he's settled on allows for physical concurrency,
he doesn't give a shit about bare-metal performance. The fundamental purpose
is to be able to write concise programs that directly model the parallelism of
the real world, written in an expository manner as to be more obviously
correct.

Pike's point is that you should be getting the best true performance by
working in an environment that helps you arrive at the ideal algorithm in it's
purest form. Making compromises to get more local physical concurrency is a
fool's errand, since at scale you're going to far outgrow single machines
anyway!

~~~
al3x
When it comes to the Go language, bare-metal performance certainly seemed to
be something Pike was giving a shit about when he spoke last week at OSCON and
the Emerging Languages Camp I organized. Go in its current fairly young stage
has performance that's not too terrible; they've mostly optimized for raw
complication speed so far, which is interesting, and uniquely suited to
Google's development problems. Pike has said that he wants Go to be a
replacement for other systems languages. That's going to mean competing with
those languages, performance-wise.

I think Clojure's concurrency model ends up more concisely and correctly
expressing the "parallelism of the real world" when you consider the dimension
of time. Rich Hickey has done some really important thinking there.

Finally, I think you'll find that anyone with a fixed budget for servers isn't
going to think that making the most of "local physical concurrency" on every
machine in their cluster is a "fool's errand". Hardware is cheaper than it
used to be, but it isn't free, and deploying and maintaining it is costly and
time consuming. If you can make the most of your hardware and operations
investements with a little more thought-work, why not do so?

~~~
blasdel
The first public release of Go made little use of physical concurrency, though
that's undoubtedly improved considerably given that the model allows for it.
His earlier languages didn't go there, probably because they didn't have
nearly as many people working on them, and physical concurrency wasn't the
focus of the research. Go is more performant and provides more static
assurances than the previous iterations, but it still has global GC.
Algorithms and Correctness are still the primary focal points.

I find it funny that you're hung up on local physical concurrency — for me
that's the prime signifier of "Scaling in the Small"! If you're going to have
to distribute your workload across multiple machines, why not just run
multiple copies of your single-cpu process on each machine, and let your
network-level work distribution mechanism handle it?

We're not talking about shared memory supercomputing using OpenMP and MPI on
special network topologies or NUMA hardware, just commodity machines running
HTTP app servers in front of data stores. We aren't curing cancer, and our
workload is already embarrassingly parallel (responding to discrete requests).

I've actually disabled some intra-request concurrency (some ImageMagick
operations are multithreaded by default) on a system I'm working on now
because it makes the workload wildly inconsistent — when independent requests
try to take advantage of all the available CPUs, the latencies spike for
everybody. It's just more software to have to monitor, and the ideal returns
are slim.

~~~
al3x
It's not so much that I'm "hung up" on local physical concurrency, just that I
don't see any reason to ignore easy gains. You can write maintainable,
correct, concurrent programs that scale across cores today if you use
technologies other than Node. So why wouldn't you?

Anyway, that's a good reply, and your ImageMagick case study is interesting.
It just goes to show how individualized "scaling" really is. Thanks for the
taking time.

~~~
blasdel
I haven't actually had a project to use Node on yet, though I have an affinity
for its _Purity of Essence_ approach, and I've used Twisted a fair deal.

If it's going to actually scale or have high-availability, a system is already
going to have to have a socket pool to distribute across machines. Since that
already works to distribute across processes on one machine, why bother adding
another layer to pool native threads? It just adds to the complexity budget.

It could easily get you much lower (1/N) latencies on unloaded systems, but in
most cases at high volume the gains aren't going to be very big compared to
running another N-1 single-threaded processes per box and it would take more
concerted effort to keep the request latencies consistent.

It seems obvious that it's worth compromising the model and adding locks or
STM to get a N00% gain for a solitary request, but what if that's only a 10%
gain when you're pumping out volume? Do stop to consider that not everyone
_wants_ their individual app processes to scale across cores — actual
simultaneous execution within one address space comes at a cost.

There are probably some cases at scale where the gains from thread-pooling are
substantial, but I could see a lot of them being where the work wasn't very
event-ish to start with, like big-data batch-flavored stuff where Hadoop would
work great (especially from data locality).

------
moe
_(No, I’m not making the “callbacks turn into a pile of spaghetti code”
argument, although I think you hear that time and again because it’s an actual
developer pain point in async systems.)_

Amen.

On my first try I was mostly underwhelmed with node precisely because of the
callback hell you end up in. I've already had my share of that in twisted and
despite the arguments various people make for it; thanks, but no thanks.

That's not to bash node as a whole, mind you. I'll most certainly revisit it
when and if it grows a stable co-routine wrapper or similar spaghetti
prevention facilities.

~~~
DTrejo
Managing lots of callbacks in Node.js:

[http://stackoverflow.com/questions/1809619/managing-lots-
of-...](http://stackoverflow.com/questions/1809619/managing-lots-of-callback-
recursion-in-nodejs)

------
KirinDave
I'm glad to see people starting to push back against the cult of “Evented is
faster.”

It may be, it may not be. The idea that it always is, though, that's
balderdash.

~~~
Tichy
It's just obvious how "threaded" can be slow even in low scale scenarios. It's
less obvious how evented can be slower than the hardware allows (not saying it
can't be). If nothing is known about the large scale, it still looks like a
clear win for evented.

~~~
frognibble
I keep hearing assertions that threaded can be slow or does not scale. Can
somebody describe the reasons here or point to a good document on the subject?

I often see "scalable" and "fast" used interchangeably in these discussions.
These are different concepts. Is the contention that threading is not as
scalable as evented, not as fast as evented, or both?

~~~
Tichy
I think traditionally, something like Tomcat used to have perhaps 30 threads?
Especially with todays Ajax-driven web sites, it is very easy to exhaust them.

I can only point to a stupid app of mine, which triggered Twitter searches for
a list of search results via Ajax (so one page of search results would
trigger, say, 10 search reguests to Twitter via Ajax, proxied through the
server). Since that app was written for a competition and non-commercial, I
hosted it on the free Heroku plan (it was a Rails app). That is how I found
out that apparently that Heroku plan has only one process, and Rails doesn't
use threads. So not only would the search requests to Twitter only be handled
sequentially, while those threads were running, no other visitors would get to
use the site (the 10 Ajax requests would already use up up to 10 threads, and
I had only one).

Just a stupid example - with event based request handling, it would not have
been an issue because the search requests to Twitter would not have blocked
everything else.

~~~
dedward
But a request to twitter blocking everything else would be an issue, right?

Aside: haproxy, configured cleverly, can deal with limiting the number of
concurrent connections permitted to your app on a URL basis (or just about any
other part of the request you can deconstruct) to allow your app to not get
hung up on slow queries and keep fast queries going where you want them.

~~~
Tichy
With something like node, a request to Twitter would not block.

Not sure how haproxy would help?

------
dmillar
Despite whether you agree with him or not, Alex is one of the best technical
writers around. This post is exemplary of that.

------
mjijackson
Excellent piece of writing.

I haven't done a whole lot of concurrency programming, so before reading this
article I didn't realize that the events vs. threads debate is still being
had. The example that Ryan Dahl likes to use to illustrate the superiority of
the event model whenever he talks about Node is nginx vs. Apache. That example
did more in my mind to reinforce the idea of events being superior to threads
(in terms of speed _and_ memory consumption) than anything else.

Keep in mind when reading this article that Alex recently left a little
company called Twitter. It's safe to say that relatively few companies will
ever have to scale the way that Twitter has.

~~~
moe
_Keep in mind when reading this article that Alex recently left a little
company called Twitter. It's safe to say that relatively few companies will
ever have to scale the way that Twitter has._

You mean the failwhale way? ;-)

Not meaning to discredit al3x but I really don't consider twitter a success
story in terms of scaling. I don't know if it's incompetence or just some
truly bad early decisions that they're still suffering from.

But one thing is for sure; other sites of much higher complexity have scaled
_much_ more smoothly to similar sizes (facebook, flickr, just two from the top
of my head).

~~~
al3x
For what it's worth, I don't think Twitter currently counts as a scaling
success story either. Hence this in my post:

"Twitter is still fighting an uphill battle to scale in the large, because
doing so is about much, much more than which technology you choose."

That's part of my point.

~~~
moe
Well, as you've been outed as being involved with twitter first-hand, can you
shed some light on the problems they are having? Or is that internal stuff,
not to be talked about?

I'm curious because I've worked with various messaging systems myself. And
albeit never having pushed them to near twitter-scale, the principles of
scaling those horizontally seem quite straightforward to me, unless complex
routing comes into play. But I don't see complex routing at twitter.

To be more precise: For all I know twitter could (should?) probably just
append those tweets to one file per user and would scale beautifully from
there. Auxiliary services like search are a different story, of course, but
those don't need to trigger failwhales when they break...

What wall do they keep hitting?

~~~
al3x
The complexities of scaling Twitter have been pretty thoroughly discussed
elsewhere, both by Twitter staff and informed third parties.

At this point in time, I don't really want to comment any further on what
issues they may or may not be having. I haven't worked there in over a couple
months, and I'll bet that big parts of the system have changed in that time.
I'm no longer informed about what's under the hood at Twitter, and I also
don't want to second-guess my former coworkers, who I'm sure are doing their
best. Sorry!

------
felixge
Node does support two models for concurrency: Non-blocking I/O and worker
processes.

Non-blocking I/O is no magic pony, so if you need access to more CPU cores,
you start creating worker processes that communicate via unix sockets. I would
argue that this is a superior scaling path than threads, because if you can
already manage the coordination between shared-nothing processes, moving to
multiple machines comes natural.

Otherwise I agree with the post. Nothing will allow you to scale a huge system
"easily".

~~~
rektide
There are features you miss by relying on IPC. UNIX Sockets are zero copy,
sure, but you still lose performance doing the marshal/unmarshal tasks, and
that could be largely avoided using shared immutable memory.

I'd much rather node have a multi-threaded Web Workers implementation of
nodejs that uses immutable shared memory for message passing. Not
coincidentally, I'd much rather push the real distributed scaling problems
(such as coordination and marshalling/demarshalling) up to a much much higher
level in my application stack. As it stands now, I need multiple
marshal/unmarshallers, for the IPC layer and the app's scaling-out/distributed
layer.

------
blasdel
> Herein lies my criticism of Node’s primary stated goal: “to provide an easy
> way to build scalable network programs”. I fundamentally do not believe that
> there is an easy way to build scalable _anything_. What’s happening is that
> people are confusing easy problems for easy solutions.

Even though I disagree with much of his technical argument, this is an
extraordinarily important point that I find myself agreeing with more and more
upon rereading. Nothing is scalable out of the box, anything can be fucked up,
and there's no silver bullet.

------
rbranson
What people tend to forget about Node.js is that everyone knows JavaScript.
This is really it's killer feature against other platforms. The ability to get
a team up and running on Node is unparalleled.

~~~
jhuckestein
Unfortunately, everybody "thinks" they know JavaScript. This is dangerous in
many ways and might just as well be a node disadvantage.

~~~
rektide
the particular danger i'd cite w/r/t JS is that its a much more flexible
language than many, and that there's not the same level of entrenched best
practices and tooling to enforce best practices. in other words, everyone
knows JS, granted, but they know very different styles of it.

consider the plethora of means of doing inheritance that exist for JS.

------
cgbystrom
I think Alex confuses "scaling in the small" with performance. Rewriting
Starling in Scala most probably improved Twitter but it did it make it more
scalable?

To me as an outsider, it sounds more as Starling (or Kestrel) got its
performance upgraded, not scalability.

May sound pedantic, but it's important to differentiate performance from
scalability.

------
jhuckestein
IMHO you may be able to scale node 'in the big' if you plan to scale on a
level above node (i.e. in node instances and inter-node communication) instead
of changing the code running in node to scale big.

~~~
rektide
ryah immediately posted the following series of tweets, to such effect:

"Thanks-some valid criticisms. The story doesn't stop at the process boundary.
Concurrency through Spawning&IPC should be discussed"

"ie, "actors", although not explicitly given that name, are baked in. I just
don't feel then need to put them inside the same process."

asked about impl discussion;

"Not really. It's not fancy: unix socks, no framing protocols, no RPC daemons.
But using what's there is a valid concurrency solution."

------
jamesshamenski
I think this is a fascinating look at an emerging company diving through the
discover process of picking core technologies. al3x, please keep going with
these.

------
blackrabbit
Really great piece; definitely a great writer even in a heavily technical
piece like this.

Interested to al3x's thoughts on the RoR vs. Python-Django also.

~~~
rufugee
_If, on the other hand, you’re working with a system that allows for a variety
of concurrency approaches (the JVM, the CLR, C, C++, GHC, etc.), you have the
flexibility to change your concurrency model as your system evolves._

I think you have his thoughts right there. Ruby and python don't allow for a
_variety of concurrency approaches_...at least not on the threading side. You
have to resort to multi-process or evented schemes.

------
c00p3r
Also do not forget classics - <http://www.kegel.com/c10k.html> ^_^

