Crazy fast, ultra-low memory usage, and was easy to integrate into our codebase. Author is hilarious and deeply cares about performance.
Easily the best C++ WebSocket library. I'm not at all surprised Alex has managed to get some additional performance out of HTTP on Node.js as well.
Alex has always been very responsive and helpful and his focus on performance is always extremely refreshing in the wake of the webdev world's "eh, good enough" mentality.
nginx is a fully fledged webserver with logging enabled out of the box, and other bells and whistles. By just having logs enabled for example you're adding significant load on the server because of log formatting, writes to disk, etc.
At the very least include the configs of each server tested.
I haven't had the time to add configurations for every server tested (esp. Apache & NGINX) but the main point here is to showcase the Node.js vs. Node.js with µWS perf. difference.
If not, then you should have taken the time to provide the information for a fair comparison with the other stacks.
As it is, you're just asking the community to take your word for it.
I think it's completely understandable that he threw in the others, probably default config, without caring much about it since they weren't the point of the writeup.
For example, in my job, since none of the frontend APIs need to handle that many requests at once, we're considering setting up a few node "frontend APIs" to lift application complexities from our JS single page app up one level. Stuff like having to hit multiple inconsistent APIs, dealing with formatting issues, etc. If you have a single API it seems much easier to deal with that, as well as expand it as time goes on. But due to lack of knowledge and experience, I don't have as much confidence with pushing this decision as I'd like. We'll obviously end up investing time and effort in performing benchmarks to make sure it meets our requirements first, but as since we're a startup that's not so large, we can't realistically afford to dump THAT much time into something that doesn't end up getting us some clear benefits.
A bit related to the topic... I know it's not exciting and sexy, but I wish more people wrote about larger non-trivial applications and how they end up tackling the challenges they encountered and details of the kinds of scales they handled. Both with respect to architecture and scaling. Maybe it's my lack of experience, but I find it really difficult to guess at how much money certain things will end up costing before doing a "close-to-real-world implementation".
It would be a fun experiment to implement a native HTTP module in Rust using Neon. https://github.com/neon-bindings/neon
>Japronto's own (ridiculous) pipeline script
are you trolling? :)
Obviously they can't, and the reason is that it isn't a significant benchmark. It doesn't actually do anything.
You tune the heck out of node.js and then take another tool without tuning it (JVM, apache, nginx etc), give it a ridiculous task that you'll never find in real world and present your results as if they are meaningful.
Why do people still waste time doing it?
As for why would someone do this. Maybe they don't know better, or maybe they are doing it because they have decided to invest in some technology tribe and thus profit from that tribe surviving, and even more from growing. This is a pretty automatic behavior for humans. Take any tribal war, e.g. XBox One vs PS4. Those who happen to own a XBox One (perhaps as a gift) can be seen at various places passionately arguing that XBox One is better than PS4, even if objectively it has worse hardware and less highly acclaimed exlcusive games. The person is on the XBox One tribe, and working towards getting more users to own XBox One will mean that more developer investments are also made towards the XBox One thanks to the bigger userbase. Thus even if the original claims to get users into the tribe were false, if growth is big enough it may work out well enough at the end.
 The RethinkDB postmortem  had a great paragraph about these microbenchmarks. People wanted RethinkDB to be fast on workloads they actually tried, rather than “real world” workloads we suggested. For example, they’d write quick scripts to measure how long it takes to insert ten thousand documents without ever reading them back. MongoDB mastered these workloads brilliantly, while we fought the losing battle of educating the market.
And (this is just my personality) I don't like being disturbed about something without trying to "solve" it. So here's my best thought on how to handle the situation where a team feels that they have a superior product which is losing out to another product that is optimized for benchmarks:
> Provide a setting called something like "speed mode". In this mode it is completely optimized for the benchmarks, at the cost of everything else. Default to running without "speed mode", but for anyone who is running benchmarks ask them if they've tried it in "speed mode". A truly competent evaluator will insist on trying the system with the options that are really used in the real world, but then the competent evaluator won't be using an unreliable benchmark anyway. Anyone running the benchmarks just to see how well it works will be likely to turn on something named "speed mode", or at least to do so if asked to. Forums will eventually fill up with people recommending "for real-world loads, you should disable 'speed mode' as it doesn't actually speed them up".
Hmm... sounds cool, but I'm not so sure it would actually work. The danger is that you would instead develop a reputation for "cheating" on benchmarks. This is why I'm not very good at marketing.
Many readers should have a feel for their own use cases and be able to relate them to "hello world" benchmark responses, for example I usually take it to mean divide stated performance by 10 immediately if a simple DB query is involved, etc.
Also if today you are using one of the compared setups you should know what performance you currently have and what tuning went into it to get a relativity.
Additionally, the ultimate microbenchmark winning code is one that does every trick in the book while not caring about anything else. This means hooking the kernel, unloading every kernel module / driver that isn't necessary for the microbenchmark, and doing the microbenchmark work at ring0 with absolute minimum overhead. Written in ASM, which is implanted by C code, which is launched by node.js. Then, if there's any data dependant processing in the microbenchmark, the winning code will precompute everything and load the full 2 TB of precomputed data into RAM. The playing field is even, JVM & Apache, or whatever else is the competition will also be run on this 2 TB RAM machine of course. They just won't use it, because they aren't designed to deliver the best results in this single microbenchmark. The point is that, not only don't microbenchmark results mean linear scaling for other work, but the techniques to achieve the microbenchmark results may even be detrimental to everything else!
 For some data sets QuickSort is actually faster. Goes to show you that the best choice is highly dependant on actual use.
Certainly they don't. But when evaluating something like this it is up to the reader to have critical thinking skills and realistic expectations about the level of experimental design applied to an admittedly alpha implementation published on a wiki on GitHub vs. maybe reading something like published in a peer review journal.
Secondly, the post doesn't seem to mention this but I'm willing to bet that this microbenchmark, like all others like this, are doing all these million requests from a single client that's located on the same machine.
How many real world use cases are there where a single localhost client will do a million requests per second and also supports HTTP/1.1 pipelining?
Just because the word "million" seems impressive, it doesn't mean much. There is a difference between a million photons hitting a tree and a million meteors hitting a tree. The rest of the context is important.
That's both a trivial AND useless information. The request handler could do an expensive 2-hours operation that uses 100% of a core for all we know. That's up to the web programmer to optimize.
The http-lib programmer, on the other hand, should optimize, and give data, for exactly what it does, nothing more, and nothing less.
People seem to conflate those responsibilities all the time when they see a benchmark. A http-parser benchmark's role is not to tell you how fast your app will serve.
What you said may be true for multithreaded apps, but resources are shared in nodejs.
It can multiplex operations at the event level however, and all its common libs follow that model. So while it might run "on one thread" it can leverage the CPU quite efficiently. And you can always run multiple processes.
So for example lets say your application has to parse a JSON POST body, talk to a database, and then serialize a JSON response. You'll be lucky to get 1k reqs/sec throughput. At that point it actually doesn't matter whether your http module can handle 65k req/sec or 1 million reqs/sec because you will never be able to serve that many anyway. If your http module did manage to pick up 65k reqs/sec from clients they would all just timeout.
These benchmarks reach those numbers by doing nothing but serving a tiny static string, but that's not what happens in real life. In summary these benchmarks are interesting, but its optimization in an area which isn't actually the thing holding back most backend servers from serving more requests per second.
>Memory management and IO aren't gone just because your http stack is fast.
Obviously yes again. But they are helped by it.
On the contrary, that's the only useful type of benchmark.
I don't care for "load simulation" full benchmarks, with loads and usage patterns that will be invariably different than mine, and which tell me nothing much.
Microbenchmarks on the other hand, are constrained to very specific situations (like the above query), and as such can be very precise in the numbers they give.
I know that if I use a similar machine and Node version, and have the same query, I will get the same performance.
And that's exactly what programmers use to identify pain points ("hmm, this kind of response handling is slow") and fix it. Isolated and targeted microbenchmarks.
I don't care for a "full" benchmark to tell me that "things will slow down with business logic and DB queries". Well, DUH!
Yes but you don't know exactly how much those things will differ between languages, which is important. For instance, you may say "wow, Node smokes Java in this echo server benchmark, I'm sold!", only to later find out that e.g., DB queries run 3x slower in Node than Java. Suddenly a more real-world benchmark makes sense...
No, then I just need an additional DB query microbenchmark.
Sure, a lot of companies like to publish benchmarks like this to make their product look better than it is (E.g. only showing the good parts), but in this case I know the author and I can vouch that he is independent and uncompromising.
You could argue that the baseline performance of a library doesn't matter as much once you start adding lots of custom logic on top, but it's still highly relevant for lightweight workloads (which are actually quite common E.g. Basic chat systems).
No, it shows that they care about benchmarks, which are rarely if ever representative of real-life scenarios.
For one, JS is also JITed. Second we have video players and other tasks done on native JS, which would be impossibly slow on say Python.
Second, JS can also be compiled -- there's asm.js and WebAssembly coming down the road.
So, yes, it might be slower than the JVM, but not that slower for most practical purposes.
But not all JITs are equal; that's like putting Brainfuck in the mix because it has a JIT. It is worth noting that JVM JIT has years of research behind it and being statically typed only adds to he benefits.
> So, yes, it might be slower than the JVM, but not that slower for most practical purposes.
Sure, my point is that the "not that slower" varies on lot depending on the kind of computation would run and having a notion that these dynamic languages are fast enough just perpetuates the misunderstanding that there exists free lunch...
I hear that a lot and it's a moot point. It's not like the same research is not available to those doing the JS JITs. Unless we're talking about patents, techniques for faster JITing are widely known, and get propagated to newer languages and runtimes all the time.
And in fact, even the people are usually the same (e.g. people that started the initial fast JITs in the days of Smalltalk, then went to JVM, and now work on V8).
>Sure, my point is that the "not that slower" varies on lot depending on the kind of computation would run and having a notion that these dynamic languages are fast enough just perpetuates the misunderstanding that there exists free lunch...
Well, certainly fast enough for web apps, where we have been using 10x slower languages with no JITs and huge overheads.
Compared to what? Python? Ruby? PHP?
None of which can do "parallelism properly" and all of which are from 2 times to an order of magnitude slower than v8 (Node's engine).
Even Go can't do parallelism properly -- it's model is full of manual handling of deadlocks and races. Erlang, yes.
I'm a bit confused by what's going on here. Are you saying the network stack required to do websockets is vastly superior to the network stack of http, and hence using a websockets network stack in http calls can produce superior results? (I didn't know the underlying networking would be different and any clarity would be helpful).
I'm not really understanding the differences but it is definitely interesting nonetheless.
A socket in the networking stack of µWS is far more lightweight (which already has been shown when it comes to µWS's websockets). The "HttpSocket" of µWS is about as lightweight in memory usage as "WebSocket", which is far more lightweight than net.Socket in Node.js.
One million WebSockets require about 300 mb of user space memory in µWS while this number is somewhere between 8 and 16 GB of user space memory using the built-in Node.js http server.
µWS is a play on its "micro" (small) sockets.
It's like saying Django has a lot of bloat in comparison to some super basic http lib, except it has all the features I'll need to build a non-trivial app.
It really is stunning, and yes microbenchmarks are very important to me and my product. I personally really do want to know how much every piece costs so I can budget memory cycles and machines. So thanks for providing the data. Even if it is slightly "ballpark".
We use it in our server as well (and have done for ages), and uWS just plain rocks.
As a community we have to work on addons and make node the true versatile and performant language it should be :)
I think ws will use 3/4 times more memory than uws with permessage-deflate disabled which is a lot, but far different from 47 times as advertised.
For example, they're using Apache as a reference point, but Apache does so much more than their code example. For one thing, you'll want to try disabling .htaccess support and static file serving so Apache doesn't actually hit the disk, like their code example doesn't.
I've found it trivial to make Python perform on the order of dozens of millions of requests per second, and I can keep scaling that basically indefinitely. But all I'm really testing, as is the given code example in the article, is a bit of looping and string manipulation.
Really curious. How did you achieve that? When you say "dozens of millions", it implies a minimum of 24+ million requests per second, which is quite unbelievable.
Could you share some examples or snippets?
No, no you haven't. This is not a cluster of servers, this is one single thread serving 1 million responses per second. Inside of Node.js.