> Buffer 4.259 5.006
In v0.10, buffers are sliced off from big chunks of pre-allocated memory. It makes allocating buffers a little cheaper but because each buffer maintains a back pointer to the backing memory, that memory isn't reclaimed until the last buffer is garbage collected.
Buffers in node.js v0.11 and io.js v1.x instead own their memory. It reduces peak memory (because memory is no longer allocated in big chunks) and removes a whole class of accidental memory leaks.
That said, the fact that it's sometimes slower is definitely something to look into.
> Typed-Array 4.944 11.555
Typed arrays in v0.10 are a homegrown and non-conforming implementation.
Node.js v0.11 and io.js v1.x use V8's native typed arrays, which are indeed slower at this point. I know the V8 people are working on them, it's probably just a matter of time - although more eyeballs certainly won't hurt.
> Regular Array 40.416 7.359
Full credit goes to the V8 team for that one. :-)
I will check out what happened to Buffer/TypedArray. Should not degrade that much unless something really goes south here.
The first major one is related to mortality of TypedArray's maps (aka hidden classes). When typed array stored in the Data variable is GCed and there are no other Uint8Array in the heap then its hidden class is GCed too. This also causes GC to find and discard all optimized code that is specialized for Uint8Array's and clear all type feedback related to Uint8Array's from inline caches. When we later come and reoptimize - optimizing compiler thinks that cleared type feedback means we need to emit a generic access through the IC (there is reasoning behind that) because this is potentially going to be a polymorphic access anyways. I have filed the issue for the root cause (mortality of typed array's hidden class).
Now there is a second much smaller issue (which also explains performance of the Buffer case) - apparently there were some changes in the optimization thresholds and OSR heuristics. After these changes we hit OSR at a different moment: e.g. I can see that we hit inner loop one that loops over `j` instead of hitting outer loop which leads to better code. In V8 OSR is implemented in a way that tries to produce optimized code that is both suitable for OSR and as a normal function code - this is done by adding a special OSR entry block that jumps to the preheader of the selected loop we are targeting with the OSR. This allows V8 to reuse the same optimized code without optimizing it again for the normal entry - but this also leads to code quality issues if OSR does not hit the outer loop because OSR entry block inhibits code motion. This is a know problem and there are plans to fix it. The hit usually is quite small unless you have very tight nested loops (like in this case).
Disabling OSR (--nouse-osr) not only "solves" the second issue but it also partially fixes (hides)the first issue: 1) we no longer optimize with partial type feedback - so we never emit generic keyed access but always specialize it for the typed array 2) we no longer emit OSR entry - hence no code quality issues related to it.
The same happens when I use the --nouse-osr parameter that you mentioned.
On my machine results are fluctuating within the same ballpark (though I am on Linux and benchmarking 64-bit builds).
And Node 0.10.35 shows basically the same results as before. I see less than 1% difference. Maybe just random fluctuation. Even if not. 1% is irrelevant.
In that post I also benchmarked the various fixes for the typed-array slowdown you mentioned. BTW --nouse-osr makes all three tests run faster.
I posted this reply on your site, but I will duplicate it here for the sake of HN readers:
> BTW --nouse-osr makes all three tests run faster.
As I tried to explain above: OSR at it is implemented now impacts code quality depending on which loop OSR hits. Which in turn depends on heuristics that V8 uses. These heuristics are slightly different in newer V8. As a result of these changes V8 hits inner loop instead of outer loop. This leads to worse code.
Code that benefits from OSR is the code that contains a loop which a) can be well optimized b) runs long b) is run only few times in total. The Sieve benchmark is opposite of this and as a result it doesn't benefit from OSR - you get bigger penalty from producing worse code and no benefit from optimizing slightly earlier.
Not using OSR for Sieve also hides the other issue with mortality of typed array's hidden classes. I say "hides" not "fixes" because one can easily construct a benchmark where the mortality would still be an observable performance issue even if benchmark itself is run without an OSR: https://gist.github.com/mraleph/2942a14ef2a480e2a7a9
Interesting background about typed-arrays. I didn't know that. Thanks!
It would have been a good thing to include that comment to the article as well.
I.e. the difference isn't necessarily node vs io, it's one point release of V8 to the next as used by node and io.
Now I wonder how node 0.11.x compares to iojs :)
2015-01-19 12:54 UTC
io.js based on node v0.11, so you need compare
- v0.10 (nodejs)- v0.11 (nodejs)
- v0.11 (nodejs)- v1.0 (iojs)
Author: Michael Schöbel
2015-01-19 13:01 UTC
I also downloaded sources and compiled the latest master branch of Node yesterday evening. Performance was within 2% of io.js for all three tests.
But most people won't compile themselves. Most will use the latest stable release.
I mean there is a good reason that
I'd recommend to perform an accurate suite of performance tests, use different OS (CoreOS, Ubuntu) that are actually used in server environments. Also different machine hardware will play a role.
There's not enough data at this point to come to any conclusion at this point imo.
This result set is like saying that 95% of the people on the web use the safari browser on the Apple website.
The main issue from my perspective is that the event loop can easily get blocked by CPU-bound tasks, preventing it from doing other things, e.g. responding to HTTP requests. You hit a similar problem with a Java servlet runner, eg. if a couple of your threads are bogged down on CPU-heavy tasks then they can't be responding to requests.
My personal preference would be to split CPU-heavy operations out so that they happen elsewhere, regardless of language, e.g having large PDFs generated by an internal microservice rather than by the webserver, or maybe via a queue in some cases. But that's just a personal preference.
* Write the CPU-heavy code in C++ and bridge it back to node.js as an add-on: http://nodejs.org/api/addons.html
* node.js is built atop libuv, so use libuv's work queues to offload the CPU-heavy code to a worker thread: http://nikhilm.github.io/uvbook/threads.html
Companies I worked for always had two environments, one for new features and one for performance.
Like PHP and C, new features where implemented in PHP and if they caught on, they got reimplemented in C if they needed better performance.
Typically I see client-side performance concerns outweighing server-side performance in a ratio of 70/30 or so, with the remaining server-side performance biased towards I/O concerns like waiting for data, or file system reads with a ratio of 90/10 or more. That puts the actual saving available to language or algorithm changes in the app layer to be less than 3% for the kinds of apps I've worked on.
I usually work at companies who are starting out, looking for Product Market Fit, where those marginal gains aren't worth the cost of reimplementing.
That's not a problem with node where you can easily implement a distributed queue system.The job queue processes will block but not your web server.
 - https://www.npmjs.com/package/webworker-threads
You have to judge for yourself which will be a better use of resources. Having more costs upfront for programming, or more costs in the long-run for servers.
There's probably also a way to call Java, ( a quick search suggests https://github.com/joeferner/node-java perhaps) but I can't speak to that in particular.
I'm going to go out on a limb and say that the proportion of real world Node apps that will be noticably affected by this is less than 1%