
Pushing the performance limits of Node.js - henridf
http://www.jut.io/blog/2015/pushing-node-js-performance-limits
======
fein
> one effective way that we get high performance out of node.js is to avoid
> doing computation in node.js.

And therein lies the problem with node. Its great for rapid prototyping, but
things fall apart pretty quickly with CPU laden tasks. That and the whole
single threaded thing with node (technically JS) can be a wrecking ball when
one task takes too long.

~~~
NathanKP
I've done CPU laden tasks many times in Node. The key is how you break up the
heavy task into pieces that can be distributed more evenly on the event loop
instead of blocking it. For example processing a huge data stream is typically
done by breaking it into chunks where each chunk can be processed in a few ms,
rather than doing one giant blocking operation on the entire stream.

Done properly this also leads to a low memory process that will grab data from
an incoming socket stream, transform it, then push that data back out through
an outgoing socket stream, only ever keeping a few MB of the stream in working
memory at a time. Slap a cluster of these streaming processes together running
on a single box to utilize all CPU cores and you can easily push many Mbps of
IO through your server quickly and efficiently.

Additionally even if you do have a heavy blocking system IO operation that you
can't break up very well you can increase `UV_THREADPOOL_SIZE` when needed to
reduce the impact of the blocking.

~~~
SignMeTheHELLUp
But now you've just implemented thread context switching by pulling it's
implementation details into your algorithm... Just use a language that
supports threading if you have real work to do...

~~~
NathanKP
Sometimes it is more efficient to implement the context switching yourself.
For extremely high performance server use cases such as realtime analytics,
streaming video processing, high frequency trading, etc it isn't uncommon to
have server code written in C/C++.

It really depends on what your most important metric is. If the most important
metric is speed then having more control over context switching is often
better.

------
xlm1717
I had never read about that 1.5GB heap limit. I've come across that same error
message - FATAL ERROR: JS Allocation failed - process out of memory - several
times in my application but googling never gave me a good answer why this was
happening, especially since I had a lot of spare memory on the server. Very
good thing to know...

------
abritinthebay
Seems to me like there's very little new here to anyone paying attention to
the space. They didn't raise any limits - they just fixed their own badly
written code.

The 1.5 gig limit is not widely known (it's a limit of V8 and I believe,
correct me if I'm wrong here, that it's improved in later versions due to some
changes to GC) but if you're hitting it you're doing something _massively_
wrong: like they were here.

Splitting data up into chunks for the event loop to process is priority one in
any Node app that deals with data processing. It has been for a long time.
This is true _anywhere_ you run JavaScript. It's where libraries like
[Highland]([http://highlandjs.org/](http://highlandjs.org/)) excel and why
Node has the concept of Buffers.

Chunking data should be a no brainer and it's frankly a little strange that
Jut weren't doing this in the first place. Raises questions about what else is
not being done correctly under the hood.

There's no meat here (aside from learning that NPM uses them).

------
KAdot
> In fact, as of this writing, the JPC depends on 103 NPM packages

It's really scary that you need so many packages to build a web app using
Node.js.

~~~
danbruc
When I read 20 seconds to compile the list of the 10 most downloaded packages
during the past 14 days, my first reaction was that this seems ridicules slow.
How many downloads could that query have to accumulate?

A search then revealed that they have surpassed a billion downloads per month.
Who on earth installs a billion packages and that every month?

~~~
NathanKP
Most companies using Node probably also have some form of CI. For example the
company I work at probably generates tens of thousands of package downloads a
day because we have CI running docker container builds for automated testing
on every commit, and each container build downloads packages from scratch for
an entirely fresh build.

Once a container is built and passes tests it is reused as many times as
needed for deploying out on edge hosts so there are no more additional package
downloads after that, but still the continual CI builds throughout the day as
people commit code generate a lot of downloads.

~~~
danbruc
Why would you download dependencies on each build instead of once you
integrate a new dependency into the project and then just keep it under source
control?

~~~
NathanKP
If you have a thorough test suite then it is possible to use a package version
pattern such as ~1.1.0, or even ^1.1.0 to allow the package to upgrade
automatically when the maintainer releases a new version.

Obviously being able to do this is highly dependent on having really thorough
test coverage to make sure that automatic package upgrades don't break stuff,
but this would be one primary reason for downloading dependencies fresh for
each build.

But even if you had locked down the package versions it still wouldn't be a
good idea to commit the package into your source control. Part of the benefit
of hitting NPM to download the package is that package maintainers will often
explicitly deprecate a package version that should now be considered outdated,
or perhaps which had a security vulnerability. This will show up as a very
visible warning message when running npm install.

This has alerted me multiple times to issues with child modules I had added to
my project, or even grandchild modules that were included by other child
modules.

