
More than one million requests per second in Node.js - alexhultman
https://github.com/uWebSockets/uWebSockets/wiki/(More-than)-one-million-requests-per-second-in-Node.js
======
erichocean
Happy µWS "customer" here. I've been using the C++ library standalone in
production since October (i.e. without Node.js).

Crazy fast, ultra-low memory usage, and was easy to integrate into our
codebase. Author is hilarious and deeply cares about performance.

Easily the best C++ WebSocket library. I'm not at all surprised Alex has
managed to get some additional performance out of HTTP on Node.js as well.

~~~
spiderkeys
We are also using the uWS C++ library in production and have been extremely
pleased with the performance. Integrating it was trivial and we haven't had
any issues.

Alex has always been very responsive and helpful and his focus on performance
is always extremely refreshing in the wake of the webdev world's "eh, good
enough" mentality.

------
halayli
often times webserver benchmarks are misleading because of how the tests were
done.

nginx is a fully fledged webserver with logging enabled out of the box, and
other bells and whistles. By just having logs enabled for example you're
adding significant load on the server because of log formatting, writes to
disk, etc.

At the very least include the configs of each server tested.

~~~
henridf
And details on the wrk (load gen) setup too, please.

~~~
alexhultman
The pipelining benchmark is identical to that of Japronto (another, very
similar thing posted here on HN a few days ago). Japronto's repo on GitHub
holds the wrk pipelining script used.

I haven't had the time to add configurations for every server tested (esp.
Apache & NGINX) but the main point here is to showcase the Node.js vs. Node.js
with µWS perf. difference.

~~~
pdimitar
How did you not have the time? Apologies, I might be missing something, but
was this an emergency work assignment?

If not, then you should have taken the time to provide the information for a
fair comparison with the other stacks.

As it is, you're just asking the community to take your word for it.

~~~
SparkyMcUnicorn
We don't need to take his word for it. It's open source, so we can run the
tests ourselves.

I think it's completely understandable that he threw in the others, probably
default config, without caring much about it since they weren't the point of
the writeup.

------
TheAceOfHearts
Although it mirrors what the other parent comments are making, I wish there
was more information readily available (or maybe it is, and I'm just not aware
of where to look for it?) information about what real-world performance is
like in different cases.

For example, in my job, since none of the frontend APIs need to handle that
many requests at once, we're considering setting up a few node "frontend APIs"
to lift application complexities from our JS single page app up one level.
Stuff like having to hit multiple inconsistent APIs, dealing with formatting
issues, etc. If you have a single API it seems much easier to deal with that,
as well as expand it as time goes on. But due to lack of knowledge and
experience, I don't have as much confidence with pushing this decision as I'd
like. We'll obviously end up investing time and effort in performing
benchmarks to make sure it meets our requirements first, but as since we're a
startup that's not so large, we can't realistically afford to dump THAT much
time into something that doesn't end up getting us some clear benefits.

A bit related to the topic... I know it's not exciting and sexy, but I wish
more people wrote about larger non-trivial applications and how they end up
tackling the challenges they encountered and details of the kinds of scales
they handled. Both with respect to architecture and scaling. Maybe it's my
lack of experience, but I find it really difficult to guess at how much money
certain things will end up costing before doing a "close-to-real-world
implementation".

~~~
cel1ne
Look into postgrest instead of graphQL as well and use the database as a
flexible middle layer to construct your API on demand.

[http://postgrest.com/en/v0.4/](http://postgrest.com/en/v0.4/)

~~~
StreamBright
We did that, due to a lack or experience with graphQL. We use Postgres as a
transactional key-value store (with proper schema though). We implemented the
filtering as simple params to the API, not as flexible as graphQL but it is
straightforward to implement on the backend side. I am not sure what is the
meaning of inconsistent API though.

------
jterry
This looks interesting. I'm surprised there aren't many existing native HTTP
modules for NodeJS. Found websockets/was as an alternative
[https://github.com/websockets/ws](https://github.com/websockets/ws)

It would be a fun experiment to implement a native HTTP module in Rust using
Neon. [https://github.com/neon-bindings/neon](https://github.com/neon-
bindings/neon)

------
c8g
>HTTP pipelining (made famous by Japronto)

>Japronto's own (ridiculous) pipeline script

are you trolling? :)

~~~
E6300
He is (a little bit).

------
xorcist
Really? 5x faster than plain nginx? That's .. remarkable, if true. I can't
seem to find the sources for that benchmark however.

~~~
throwawayish
Wholly irrelevant, just like the Python "benchmark" that was on the front page
yesterday.

~~~
xorcist
Irrelevant for what? It's opening and closing an unfathomable number of
sockets in a short time span, so it would be limited to how many the kernel
can handle. Maybe subject to limitations in the glibc epoll() wrapper as well.
So it's not irrelevant if you want to benchmark some change in the kernel for
example. (There is a http parser in there as well, but I don't think even
replacing it with a dummy one would quadruple total throughput even. Which is
why I'm skeptical. The Python thing didn't claim performance above nginx.)

------
PunchTornado
This kind of benchmark is completely useless.

You tune the heck out of node.js and then take another tool without tuning it
(JVM, apache, nginx etc), give it a ridiculous task that you'll never find in
real world and present your results as if they are meaningful.

Why do people still waste time doing it?

~~~
romanovcode
Actually it's kind of funny how Node community are obsessed about benchmark
and speed because Node is very slow on CPU and cannot even do parallelism
properly.

~~~
anilgulecha
That's completely incorrect. Well known cluster modules allows very easy
parallelism at any level in your application. Plus Node is among the fastest
interpreted languages around today -- coming close to the jvm in performance.

~~~
sanjayts
I am awfully wary of these statements which paint languages like Python (via
PyPy), Javascript (via Node) as very close competitors of the JVM. Once the
JIT engine kicks in, on "real" workloads, JVM beats the lights out of these
carefully tuned interpreted languages on a CPU intensive workload.

~~~
coldtea
> _Once the JIT engine kicks in, on "real" workloads, JVM beats the lights out
> of these carefully tuned interpreted languages on a CPU intensive workload._

For one, JS is also JITed. Second we have video players and other tasks done
on native JS, which would be impossibly slow on say Python.

Second, JS can also be compiled -- there's asm.js and WebAssembly coming down
the road.

So, yes, it might be slower than the JVM, but not that slower for most
practical purposes.

~~~
sanjayts
> For one, JS is also JITed

But not all JITs are equal; that's like putting Brainfuck in the mix because
it has a JIT. It is worth noting that JVM JIT has years of research behind it
and being statically typed only adds to he benefits.

> So, yes, it might be slower than the JVM, but not that slower for most
> practical purposes.

Sure, my point is that the "not that slower" varies on lot depending on the
kind of computation would run and having a notion that these dynamic languages
are fast enough just perpetuates the misunderstanding that there exists free
lunch...

~~~
coldtea
> _It is worth noting that JVM JIT has years of research_

I hear that a lot and it's a moot point. It's not like the same research is
not available to those doing the JS JITs. Unless we're talking about patents,
techniques for faster JITing are widely known, and get propagated to newer
languages and runtimes all the time.

And in fact, even the people are usually the same (e.g. people that started
the initial fast JITs in the days of Smalltalk, then went to JVM, and now work
on V8).

> _Sure, my point is that the "not that slower" varies on lot depending on the
> kind of computation would run and having a notion that these dynamic
> languages are fast enough just perpetuates the misunderstanding that there
> exists free lunch..._

Well, certainly fast enough for web apps, where we have been using 10x slower
languages with no JITs and huge overheads.

------
ttt111222333
> As http sockets in µWS build on the same networking stack as its websockets,
> you can easily scale to millions of long polling connections, just like you
> can with the websockets. This is simply not possible with the built-in
> Node.js http networking stack as its connections are very heavyweight in
> comparison.

I'm a bit confused by what's going on here. Are you saying the network stack
required to do websockets is vastly superior to the network stack of http, and
hence using a websockets network stack in http calls can produce superior
results? (I didn't know the underlying networking would be different and any
clarity would be helpful).

I'm not really understanding the differences but it is definitely interesting
nonetheless.

~~~
alexhultman
What I mean with this is that any connection (socket) in Node.js builds on
net.Socket which builds on uv_tcp_t which together requires a significant
amount of memory (bloat).

A socket in the networking stack of µWS is far more lightweight (which already
has been shown when it comes to µWS's websockets). The "HttpSocket" of µWS is
about as lightweight in memory usage as "WebSocket", which is far more
lightweight than net.Socket in Node.js.

One million WebSockets require about 300 mb of user space memory in µWS while
this number is somewhere between 8 and 16 GB of user space memory using the
built-in Node.js http server.

µWS is a play on its "micro" (small) sockets.

~~~
azinman2
I feel like "bloat" is thrown about so much these days, with little credence
to actually defining it in a per situation context. It would be far more
credible to me to not use such a handwavey term and instead talk about what
the memory differences are, and why one might use much less memory than the
other. Often times one person's "bloat" is another's necessary feature to
accomplish their goals.

It's like saying Django has a lot of bloat in comparison to some super basic
http lib, except it has all the features I'll need to build a non-trivial app.

------
socmag
Fantastic work Alex. I sent you email earlier when I saw this.

It really is stunning, and yes microbenchmarks are very important to me and my
product. I personally really do want to know how much every piece costs so I
can budget memory cycles and machines. So thanks for providing the data. Even
if it is slightly "ballpark".

We use it in our server as well (and have done for ages), and uWS just plain
rocks.

Highly recommended

------
rafaelferreira
I'd love to see a design document explaining the differences from, say, nginx,
that enable this kind of performance results.

------
albertTJames
Thanks for this amazing work ! Can't wait to use it.

As a community we have to work on addons and make node the true versatile and
performant language it should be :)

[https://nodejs.org/api/addons.html](https://nodejs.org/api/addons.html)

~~~
alexhultman
I definitely agree the Node.js universe needs to take a better look at using
addons. My opinion is that one should only use JS for the application logic,
which requires high productivity, and only (or mostly) implement core modules
as addons. It makes sense to use JS where productivity matters and to skip it
where performance matters.

------
bhouston
This is pretty great stuff. Please keep it up and don't pay attention to the
naysayers. This type of optimization is great and will pay dividends down the
road for a lot of projects if this can take off.

------
yunda
What if standard http module to be replace with this in express? Is it going
to work?

------
diegorbaquero
uWS certainly reduces the overhead to a minimum, saving lots of memory that
can be used to scale up and saving CPU left to your app's code. I wrote this
article a few months ago when I switched to uws in the WebTorrent tracker.
[https://hackernoon.com/%C2%B5ws-as-your-next-websocket-
libra...](https://hackernoon.com/%C2%B5ws-as-your-next-websocket-
library-d34209686357?source=linkShare-a1d0f3f9aca2-1486218881)

~~~
lpinca
Did you ever try to disable permessage-deflate with ws? It will never be as
lightweight as uws because ws is built on top of `net.Socket` but I think you
hit this ws issue
[https://github.com/websockets/ws/issues/804](https://github.com/websockets/ws/issues/804)
in the WebTorrent tracker.

I think ws will use 3/4 times more memory than uws with permessage-deflate
disabled which is a lot, but far different from 47 times as advertised.

------
bfrog
I'm more interested in the techempower style benchmarks, at least those show
some sort of semblance of real life usage. Do some queries, return encoded
json etc.

------
qaq
I think the key benefit is actually significantly reduced memory footprint.

~~~
alexhultman
I agree, this is one single factor that is constant in all apps: your long
kept sockets will require far less memory which directly impacts the number of
long polling clients you can have. Having fast throughput is just a bonus.

------
notzorbo3
It's important to note that with all these "X requests per second" benchmarks,
they're almost never testing actual performance, but rather just less
features. The architecture (event loop, forking, threading or any combination
of those) also matters a lot, but they serve _completely_ different purposes.

For example, they're using Apache as a reference point, but Apache does _so_
much more than their code example. For one thing, you'll want to try disabling
.htaccess support and static file serving so Apache doesn't actually hit the
disk, like their code example doesn't.

I've found it trivial to make Python perform on the order of dozens of
millions of requests per second, and I can keep scaling that basically
indefinitely. But all I'm really testing, as is the given code example in the
article, is a bit of looping and string manipulation.

~~~
lessclue
> I've found it trivial to make Python perform on the order of dozens of
> millions of requests per second

Really curious. How did you achieve that? When you say "dozens of millions",
it implies a minimum of 24+ million requests per second, which is quite
unbelievable.

~~~
notzorbo3
Through forking + gevent and then sleeping in each request handler. Of course
it measures nothing other than a whole bunch of while loops running in one
fork per CPU waiting for just about nothing. In other words, I'm benchmarking
"how much memory do I have", which is pointless. But it sure does scale!

~~~
lessclue
TCP handshakes->HTTP parsing->Sleep->Response writing. Can the overhead added
by these (and more) possibly produce 24+ million requests / sec on a commodity
machine?

Could you share some examples or snippets?

~~~
mozumder
the way most python web apps work is they don't do the TCP handshake & http
parsing, and leave that up to the front-end web server (nginx/apache/etc).
Python only comes in via a fastcgi or wsgi proxy.

