
Node.js memory benchmark confirms V8's GC may not be ready for the server - hannesw
http://hns.github.com/2010/09/29/benchmark2.html
======
felixge
To the author of this article: Could you run your node test with "node
--trace-gc <script>" turned on? That will output when, and for how long,
node's GC is doing it's thing.

Anyway, this could be a legit complaint at this point.

~~~
mraleph
\--trace-gc will show an endless sequence of mark-sweep/compacts.

I explained why that happens in my comment to the original post.

~~~
felixge
Great comment! Anybody interested should check it out:
[http://hns.github.com/2010/09/29/benchmark2.html#comment-820...](http://hns.github.com/2010/09/29/benchmark2.html#comment-82051337)

~~~
olegp
It explains why JSON parsing doesn't perform well, but doesn't explain the
original benchmark results. Would be great to get a comment from @mraleph or
the other V8 guys on that.

~~~
mraleph
Speaking about GC behavior of other benchmarks:

1) one with Buffers is also causing mark-sweep/compact pauses (7-15ms each)
because Buffer constructor calls AdjustAmountOfExternalAllocatedMemory which
triggers full gc cycle if it thinks that too much memory is held outside V8.

2) GCs in string based benchmark are mainly scavenges taking 0 ms plus < 10
full collections taking <6ms each on my desktop.

That is all I can say. V8 GC is performing well here from my point of view.

~~~
hannesw
I just found out the same. So I guess it must be some intra-VM data shifting?
Anyway, I'll update my posting accordingly.

~~~
mraleph
I don't know node.js well enough to even make a guess here. Somebody needs to
profile it.

Updating your post sounds like a nice idea. It created a lot of confusion
among developers.

~~~
hannesw
Updated!

------
amix
V8 has a very impressive garbage collector (stop-the-world, generational,
accurate) and the GC is probably a part the Google team have spent a lot of
time tuning and working on as it's one of the harder and most important parts
of building a VM...

My guess is that node's GC configuration isn't finely tuned for 25KB
structures or maybe the GC is called prematurely.

Some suggestions: try to turn off the GC and re-do the benchmark, try with
smaller JSON datastructures, try with different versions of node. Each of
these would give more evidence where the problem is.

Btw. in that benchmark: which versions of RingoJS and node.js are used? How
much memory does each server use in the end?

Edit: What type of garbage collector does RingoJS/Rhino use - how is the GC
configured for RingoJS?

~~~
hannesw
Most of these questions are answered in my original, longer post:
<http://hns.github.com/2010/09/21/benchmark.html>.

The JSON I'm parsing is just objects with short string properties (around 10
characters). There's just one longer 25kb JSON string but that is never
collected. As to Node configuration, can you provide some specific options to
use? I've been asking about this on #node.js (and ryan) and I'm open to any
suggestions.

Ringo is running with the server hotspot JVM without any further options.

~~~
xearl
Per default, Java 6 uses a generational collector with multi-threaded stop-
the-world copying for the young generation and single-threaded stop-the-world
mark-sweep-compact for the tenured generation.

------
sh1mmer
I think the editorializing of the headline is unnecessarily negative (read
biased). There is obviously an issue Ryan is working on addressing but Node is
clearly already providing performance which is suitable for many server
workloads.

~~~
hannesw
I'm the author of both the original article and this HN posting - and yes, I
am biased, since I'm the main developer of RingoJS (the other platform in that
benchmark). I've made that quite clear and provided additional background in
the original benchmark to which this is just a short update:
<http://hns.github.com/2010/09/21/benchmark.html>

I think my benchmark and the conclusions I draw from it (after a lot of
thinking) are fair. My intention is just to make people see there's no magic
bullet with performance or scalability, and that there are alternatives for
server-side JavaScript.

~~~
sh1mmer
I think your conclusions in the article are fair. I think the title on HN is
misleading because it's a quantitive issue.

V8 GC is a well known concern in the Node community, but it's still performing
well enough that Node is considerably faster than traditional servers (like
Apache). The fact that Ringo is also faster doesn't make V8 "not ready" it
just means it could be improved.

If I wanted to be contentious I could suggest that "Hacker News comments
confirms that RingoJS may not be ready for developers" because the author
likes taking pot shots at other frameworks. But that would be petty, wouldn't
it?

~~~
hannesw
You are right about the title. That "not ready for the server" is a foolish
phrase. I'd change it to "not tuned for the server" if I could, but it looks
like it's impossible to change that now.

~~~
IsaacSchlueter
Maybe you could change it to "Not as optimized as Rhino to parse JSON under
heavy load."

------
js4all
First, the response time variation is an important observation, thanks for
that. To make sure, that it is caused by V8's CG, we need a GC log from the
benchmark.

Second to make this a fair comparison, you need to use a similar sized heap.
It could be, that the JVM heap was large enough to run the whole test without
a major CG. We need a GC log also for this part of the test.

Sun was aware of the different CG strategies needed for server and client use
and let us choose between them. V8 however seems the be optimized for the
client side.

------
IsaacSchlueter
I'm not yet convinced this is due to GC. Sent a pull request[1] to Hannes to
use the faster Buffer technique, to at least rule out the interference of v8's
crazy slow string juggling under load.

1: <http://github.com/hns/ringo-node-benchmark/pull/1>

------
robin_reala
Has anyone done any experiements with node.js and Jägermonkey? They’re getting
pretty close to V8 in speed ( <http://arewefastyet.com/> ) and might prove
better for server use (utter conjecture on my part).

~~~
midnightmonster
My understanding is that the Xmonkeys are not nearly so amenable to embedding
as node. There are some server implementations of older monkeys, though:

* whitebeam <http://whitebeam.org/>, which to my surprise appears to be not actually dead yet.

* jaxer <http://jaxer.org/>

* couchdb

But none of these seem to be low level enough to make a reasonable comparison.

Flusspferd <http://flusspferd.org/> seems most like node, but using
spidermonkey (circa Firefox 3.5, it seems, but actively maintained, so there's
hope for future enhancements). Unfortunately, it doesn't yet seem to have all
the handy web serving stuff that node has, so probably still not useful for a
competitive benchmark.

------
newman314
I wonder what effect this will have in a memory constrained environment on say
webOS.

The original Palm Pre only has 256MB of RAM and had plenty of "out of mem
issues" prior to homebrew adding compcache

------
russell_h
Why does he pin this on the garbage collector?

~~~
hannesw
Just an educated guess. If you're allocating tons of objects and strings and
your app gets slow, it's very likely to be the GC. But I don't know V8 well
enough to say for sure.

~~~
xearl
When plotting the response time over time, this educated guess becomes even
more solid:

[http://earl.strain.at/2010/ringo-node-benchmark/buffer-
alloc...](http://earl.strain.at/2010/ringo-node-benchmark/buffer-alloc-
scatter.png)

(This uses the very same dataset underlying the "buffer alloc" graph in the
original post.)

------
jiaaro
It seems like node performs better at the bottom of the curve where most of us
are likely to be.

Can the performance issues be solved by adding hardware? (or spin up more node
processes so that each one stays at the bottom of the curve?

~~~
xearl
You're interpreting the graph incorrectly. There's no "bottom of the curve",
the graph is showing a distribution, not some property over time. A correct
reading will e.g. tell you that out of 50'000 requests a total of ~30'000
requests completed in 100ms or less. Or similarly for what you call the bottom
of the curve: 5'000 out of 50'000 requests complete in ~25ms or less. This
does _not_ state which 5'000 requests that were, and in particular it does
_not_ imply that the _first_ 5'000 requests are faster. For this you'd have to
plot response time over time.

~~~
jiaaro
ahh... For some reason I thought it was showing request volume

------
plq
python did not return any memory whatsoever to the operating system before
2.5.

<http://bugs.python.org/issue1123430>

------
joevandyk
"Confirms it may not be ready"? Really?

I've just confirmed I may have solved cancer.

