
Node.js and io.js – Very different in performance - mschoebel
http://geekregator.com/2015-01-19-node_js_and_io_js_very_different_in_performance.html
======
bnoordhuis
Interesting results, thanks for sharing. I can perhaps shed some light on the
performance differences.

> Buffer 4.259 5.006

In v0.10, buffers are sliced off from big chunks of pre-allocated memory. It
makes allocating buffers a little cheaper but because each buffer maintains a
back pointer to the backing memory, that memory isn't reclaimed until the last
buffer is garbage collected.

Buffers in node.js v0.11 and io.js v1.x instead own their memory. It reduces
peak memory (because memory is no longer allocated in big chunks) and removes
a whole class of accidental memory leaks.

That said, the fact that it's sometimes slower is definitely something to look
into.

> Typed-Array 4.944 11.555

Typed arrays in v0.10 are a homegrown and non-conforming implementation.

Node.js v0.11 and io.js v1.x use V8's native typed arrays, which are indeed
slower at this point. I know the V8 people are working on them, it's probably
just a matter of time - although more eyeballs certainly won't hurt.

> Regular Array 40.416 7.359

Full credit goes to the V8 team for that one. :-)

~~~
mraleph
I can explain what happened to Array case. 100000 used to be the threshold at
which new Array(N) or arr.length = N started to return a dictionary backed
array. Not anymore: this was changed by
[https://codereview.chromium.org/397593008](https://codereview.chromium.org/397593008)
\- now new Array(100001) returns fast elements array.

I will check out what happened to Buffer/TypedArray. Should not degrade that
much unless something really goes south here.

~~~
mraleph
Ok reporting back. There are two issues here.

The first major one is related to mortality of TypedArray's maps (aka hidden
classes). When typed array stored in the Data variable is GCed and there are
no other Uint8Array in the heap then its hidden class is GCed too. This also
causes GC to find and discard all optimized code that is specialized for
Uint8Array's and clear all type feedback related to Uint8Array's from inline
caches. When we later come and reoptimize - optimizing compiler thinks that
cleared type feedback means we need to emit a generic access through the IC
(there is reasoning behind that) because this is potentially going to be a
polymorphic access anyways. I have filed the issue[1] for the root cause
(mortality of typed array's hidden class).

Now there is a second much smaller issue (which also explains performance of
the Buffer case) - apparently there were some changes in the optimization
thresholds and OSR heuristics. After these changes we hit OSR at a different
moment: e.g. I can see that we hit inner loop one that loops over `j` instead
of hitting outer loop which leads to better code. In V8 OSR is implemented in
a way that tries to produce optimized code that is both suitable for OSR and
as a normal function code - this is done by adding a special OSR entry block
that jumps to the preheader of the selected loop we are targeting with the
OSR. This allows V8 to reuse the same optimized code without optimizing it
again for the normal entry - but this also leads to code quality issues if OSR
does not hit the outer loop because OSR entry block inhibits code motion. This
is a know problem and there are plans to fix it. The hit usually is quite
small unless you have very tight nested loops (like in this case).

Disabling OSR (--nouse-osr) not only "solves" the second issue but it also
partially fixes (hides)the first issue: 1) we no longer optimize with partial
type feedback - so we never emit generic keyed access but always specialize it
for the typed array 2) we no longer emit OSR entry - hence no code quality
issues related to it.

[1]
[https://code.google.com/p/v8/issues/detail?id=3824](https://code.google.com/p/v8/issues/detail?id=3824)

~~~
mschoebel
Very interesting. After reading your comment, I tried allocating another
Uint8Array and keeping it allocated throughout the entire test as a workaround
for the issue you mentioned. Time for Node.js was unchanged, but io.js was
down to about 5.5s now. Almost the same time as Node. Only about 10% slower.

The same happens when I use the --nouse-osr parameter that you mentioned.

~~~
mraleph
Is it 10% slower even if you keep array alive _and_ apply --nouse-osr (to both
node.js and io.js)?

On my machine results are fluctuating within the same ballpark (though I am on
Linux and benchmarking 64-bit builds).

~~~
mschoebel
Ok, I hadn't tested with _both_ before. Keeping the array alive _and_ using
--nouse-osr makes io.js only 2.3% slower than my original measurement for Node
0.10.35. Median of 5058ms.

And Node 0.10.35 shows basically the same results as before. I see less than
1% difference. Maybe just random fluctuation. Even if not. 1% is irrelevant.

------
explorigin
There are two very good comments at the bottom of the article. Here for your
consumption:

Author: (Unknown) 2015-01-19 12:54 UTC

io.js based on node v0.11, so you need compare

\- v0.10 (nodejs)- v0.11 (nodejs) \- v0.11 (nodejs)- v1.0 (iojs)

Author: Michael Schöbel 2015-01-19 13:01 UTC

I also downloaded sources and compiled the latest master branch of Node
yesterday evening. Performance was within 2% of io.js for all three tests.

But most people won't compile themselves. Most will use the latest stable
release.

~~~
longlivegnu
>But most people won't compile themselves. Most will use the latest stable
release

I mean there is a good reason that

------
cdnsteve
Reporting benchmark results on a single OS, on a single CPU type isn't really
benchmarking. It's an isolated case of results.

I'd recommend to perform an accurate suite of performance tests, use different
OS (CoreOS, Ubuntu) that are actually used in server environments. Also
different machine hardware will play a role.

There's not enough data at this point to come to any conclusion at this point
imo.

This result set is like saying that 95% of the people on the web use the
safari browser on the Apple website.

~~~
morenoh149
... in the apple store

------
Kiro
> This can be extremely important if you have a project with heavy CPU-use

Would you recommend using something different than JavaScript when writing CPU
heavy apps? I was under the impression that it's better suited when dealing
with high I/O.

~~~
richmarr
A few folks have replied with the usual "use C/C++/Java instead", but in the
real world it's often either impractical (or rather commercially indefensible)
to fork out a different environment with its own training, testing,
environment, automation, documentation and maintenance overheads. A blanket
rejection of Node for CPU-heavy tasks is naive.

On the issue of performance, V8 lets Javascript run pretty quickly. Yes, there
are languages that broadly offer faster execution, but that's far from the
only factor in choosing a solution.

The main issue from my perspective is that the event loop can easily get
blocked by CPU-bound tasks, preventing it from doing other things, e.g.
responding to HTTP requests. You hit a similar problem with a Java servlet
runner, eg. if a couple of your threads are bogged down on CPU-heavy tasks
then they can't be responding to requests.

My personal preference would be to split CPU-heavy operations out so that they
happen elsewhere, regardless of language, e.g having large PDFs generated by
an internal microservice rather than by the webserver, or maybe via a queue in
some cases. But that's just a personal preference.

~~~
k__
Really?

Companies I worked for always had two environments, one for new features and
one for performance.

Like PHP and C, new features where implemented in PHP and if they caught on,
they got reimplemented in C if they needed better performance.

~~~
richmarr
Sure, if the companies you've worked for are in a field that needs incremental
performance gains and are willing to pay for it then that's totally rational.

Typically I see client-side performance concerns outweighing server-side
performance in a ratio of 70/30 or so, with the remaining server-side
performance biased towards I/O concerns like waiting for data, or file system
reads with a ratio of 90/10 or more. That puts the actual saving available to
language or algorithm changes in the app layer to be less than 3% for the
kinds of apps I've worked on.

I usually work at companies who are starting out, looking for Product Market
Fit, where those marginal gains aren't worth the cost of reimplementing.

~~~
k__
Fair enough. The companies I talked about didn't do this right from the start.
Most of them after a few years.

------
SixSigma
On a slight tangent, there's an article using the Sieve of Eratosthenes
demonstrating the use of Communicating Sequential Threads (CSP) on Russ Cox'
website (one of the developers of Go)

[http://swtch.com/~rsc/thread/](http://swtch.com/~rsc/thread/)

------
wolframhempel
These are very interesting findings. On a higher level though: Are there any
significant performance differences between the APIs of node and io? E.g. tcp
package processing, file system access etc? I know that a lot of them are
effectively C, so independent of the V8 version.

------
richmarr
> Depending on what your Node application does, my > findings may or may not
> apply to your use-case.

I'm going to go out on a limb and say that the proportion of real world Node
apps that will be noticably affected by this is less than 1%

------
daphneokeefe
The competition between these teams is going to make both of them better. They
will not only be competing on speed, but also on features. A huge win for
developers.

