

WebSocket benchmarks - gmcabrita
https://github.com/ericmoritz/wsdemo/blob/master/results.md

======
jerf
"I expected Go to kick Erlang's ass in the performance department but the
message latency was much higher than Erlang's latency and we had 225 unhappy
customers."

Go is neat and I hope to see it succeed and thrive, but it is simply
unavoidable that Erlang has existed for much longer, has been getting tuned
for much longer, and this sort of thing is its monomaniacal focus whereas Go
is spread a bit more thinly at the moment. Go is trying to be a systems
language, Erlang very much isn't.

~~~
lemming
What was surprising was how much faster the Java implementation was than Go. I
expected it to be competitive but I didn't expect it to perform much better.

~~~
zemo
his reporting is very misleading. If you look at the number of messages and
connections, Java only held ~5k connections, while Go held just shy of ~10k.
In the same amount of time, Java was only able to facilitate the transmission
of half the number of messages, so... it depends what you mean by "fast".

~~~
stock_toaster
Interestingly, Go's "connection time" was by far the lowest.

He also tested with m1.medium instances, which are single cpu. No mention if
the instances were 64bit or 32bit (this may matter for Go, as the GC has some
issues under 32bit currently).

Tests are hard. Still nice to see a real-world-like comparison between some
popular stacks.

Edit: discussion on go-nuts: [https://groups.google.com/group/golang-
nuts/browse_thread/th...](https://groups.google.com/group/golang-
nuts/browse_thread/thread/6dc8468d6237f676)

~~~
ericmoritz
I used the m1.medium instance on 64bit Ubuntu 12.04. I will update the
results.md to note that.

I will also rerun all the tests on 2 core 64bit instances tonight.

------
halayli
When benchmarking language performance results, it's a good practice to
explain why is one language faster than the other. In other words, profile and
figure out the bottleneck.

If you don't know why, then it's better not to post them until you figure out
the bottlenecks. There could be factors in your environment that can affect
one language and not the other which you are not aware of. Publishing such
number will confuse people.

~~~
borlak
My guess would be that the poor performing implementations are using the naive
select() solution, instead of one of the more logarithmic solutions like epoll
or kqueue.

Pretty surprised to see node.js behave so poorly. We had discussions recently
at where I work as to what server solution we should use for a multiplayer
platform, and as I had recently built a socket/connection handling server in C
they asked what I thought about node.js, and my only concern was that if the
underlying code uses select(), then it's going to be a poor choice...but I
don't think anyone got around to testing it.

search for: C10K for more information

~~~
halayli
Poller is a very important factor but there are other things to consider as
well depending on how fast you want to go. But sometimes 'fastest' is not what
you should look at though.

Solutions like node.js, python, etc.. copy strings and buffers all the time
and this can slow down the server considerably. They don't have an honest iov
layer that won't copy data before passing it to writev and you cannot manage
your own memory and create memory pools. These factors can have a bigger
impact on performance than selecting the right poller.

~~~
ajross
That doesn't sound right to me. There are 10k clients, each sending (and the
server echoing) one small timestamp record per second. An AWS medium instance
has access to half a DRAM pipe (frankly this test should fit entirely in L3
cache, but let's be conservative). The back of my envelope says that the RAM
bandwidth can handle about 100kb of data copying per send or receive event.
That's a _staggeringly_ large amount of copying, literally thousands of times
larger than the size of the record.

Honestly I think CPU or syscall overhead is a more likely culprit.

~~~
halayli
You'd think.

Write a web server in C and a web server in python/erlang/go/node.js. You'll
notice that the one written in C (or C++) runs twice as fast. If it was a
syscall bottleneck then they'd achieve the same performance.

zero-copy request handling is not possible in high level languages without
hacking.

For a quick test, run httperf against the dumbest node.js web server that
returns 204.

Then run the same test against an nginx server that does nothing but return a
204

location / { return 204; }

Compare the difference.

------
Weltschmerz
This node.js version is most certainly faster
[https://github.com/Weltschmerz/wsdemo/blob/master/competitio...](https://github.com/Weltschmerz/wsdemo/blob/master/competition/wsdemo.js)

Wondering how much!

~~~
kcbanner
Hm, what performance impact does using arguments have?

~~~
Weltschmerz
Using .apply({}, arguments) is actually a bit slower, but either method is so
fast that it won't be a bottleneck. I use it in this case for elegance,
because I can. Note that in your original implementation this is impossible,
because you have to access the "type" variable and use one of two methods
(sendUTF or sendBytes). `ws` has only one method (send) and the parameters of
this method match the arguments to the callback for on('message').

------
mumrah
Not surprised to see Erlang come out on top. As I understand it, this is
precisely the type of problem Erlang is good at - doing lots of little
lightweight things with minimal overhead.

~~~
MetaCosm
Erlang is quietly doing heavy lifting all over the place. Amazon's SDB,
Facebook's Chat, Various High Speed Trading Implementations, Ejabberd,
CouchDB, RabbitMQ... and far more places quietly and privately.

The good news is, it finally seems like other languages are starting to tool
out the way Erlang has (Akka, ZMQ, etc).

------
lucian1900
Python network programs, especially ones with many connections, should use
Twisted.

A benchmark of anything but the best library/tools in each language is
pointless.

~~~
TazeTSchnitzel
One of the good Twisted WebSocket libraries is called txWS:
<https://github.com/MostAwesomeDude/txWS>

It wraps an existing TCP factory, and makes it work with WebSocket.

The entire API is a single function.

~~~
ericmoritz
Excellent. I wanted to write a twisted implementation. care to take a crack at
it and submitting a pull request?

~~~
TazeTSchnitzel
Should I have time, I'll think about it.

~~~
ericmoritz
I can implement the server. Is there anything I should know about tuning
Twisted for this kind of benchmark?

~~~
Scramblejams
If you haven't yet, hit up #twisted on freenode for any tips you need. Lots of
heavy hitters there who are happy to help.

------
willvarfar
I'd love to see numbers for my hellepoll on that configuration.

<https://github.com/williame/hellepoll>

[http://williamedwardscoder.tumblr.com/post/18200335569/the-h...](http://williamedwardscoder.tumblr.com/post/18200335569/the-
history-of-hellepoll)

[http://williamedwardscoder.tumblr.com/post/13590981677/perfo...](http://williamedwardscoder.tumblr.com/post/13590981677/performance-
lessons-for-http-sockets)

------
ericmoritz
This initial release is not meant to be definitive. I welcome pull requests
for better implementations in your language/platform of choice and any
suggestion to better tune Linux for these tests.

------
ajacksified
Pretty neat. I'm going to fork it and throw it against my C# implementation
(<http://www.alchemywebsockets.net>) on mono and see how it compares.

~~~
Liongadev
would like to see that, please post it

------
zurn
Looking at <https://github.com/ericmoritz/wsdemo> it seems he's using his own
erlang ws implementation and using a third party library for all the others?

If this is correct, then it's no big surprise that he does well in his own
benchmark that he's developing against.

~~~
motiejus
No. He is using Cowboy[1], which _is_ the mainstream websocket Erlang
implementation (uh.. cowboy actually is a socket acceptor pool which happens
to have awesome HTTP and WS handlers).

[1]:
[https://github.com/ericmoritz/wsdemo/blob/master/src/wsdemo....](https://github.com/ericmoritz/wsdemo/blob/master/src/wsdemo.erl)
[2]: <https://github.com/extend/cowboy>

------
riffraff
did I miss something or the methodology lacks a warm up step?

What I mean is, I'd expect a first run of 5 mins to warm up the servers, than
another one to actually get the data.

Otherwise, while averages will be mostly the same, numbers like the dropped
connections _could_ be misleading.

------
DanWaterworth
With websocket connections, provided the latency is low enough, memory usage
is a bigger concern.

