

~2M msgs/second messaging system written in Go - timf
https://gist.github.com/4227635

======
eps
How big is _defaultSendBufSize_?

If it's something like 1 Gig, then it's the OS that should be commended for
the miracle of throughput, not Go. Even VB would be able to pull numbers like
these with aggressive buffering.

A more sensible metric would be to measure the throughput _and_ the longest
time in transit. If you get Go deliver 2 mil/sec with sub-ms delivery time,
then we'll have something to talk about.

~~~
krobertson
I work with Derek at Apcera. The send buffer in the test is 16kb.

------
moe

      $ time seq 5000000 >/dev/null
      1.10s user 0.00s system 99% cpu 1.107 total
    

There, my Mac Mini can push ~5M msgs/second!!1one

Do I get a pony now?

Seriously, what is this doing on HN and what on earth are people discussing?

If you want to brag with benchmarks then how about providing at least a remote
clue about what you are measuring...

~~~
derekcollison
The simple benchmark is testing throughput of a messaging system, specifically
a new server written in Go. Both the client and server are on the same
machine, but going over a tcp/ip socket. The Pub benchmarks send messages and
then make sure the connection is flushed, meaning all messages have been
completely processed by the server. Processing in this case means framing,
protocol and routing logic. The PubSub versions test sending and receiving all
the messages back in the client, with a few variations on using multiple
connections and distributed queuing.

~~~
moe
Thanks, that's much more informative!

I'm looking forward to the full suite being released so we can repeat the
tests and see how it fares under real-world conditions with different message-
sizes etc.

Sorry for the snark in my initial comment but I found this gist really lacking
(akin to those press releases where $vendor brags about some arbitrary figure
without providing any details).

I do like the simplicity of NATS and look forward to a fast server
implementing it. However for something like a MQ where performance is _the_
key-metric and highly dependent on the chosen workload one really shouldn't
throw around numbers without backing them up thoroughly - that only hurts
credibility.

Please take the write-ups by the RabbitMQ guys as a guide, who publish the
source-code for all their benchmarks and go to great lengths explaining them:

[http://www.rabbitmq.com/blog/2012/04/25/rabbitmq-
performance...](http://www.rabbitmq.com/blog/2012/04/25/rabbitmq-performance-
measurements-part-2/)

------
halayli
Hmm, you cannot make 2M read/write syscalls a sec. So what's the definition of
a message here and through what transport medium does it go through?

~~~
Udo
The code sample suggests it's a network interface at localhost, but it's _not_
2M syscalls or connections. It's just one big buffered write and it doesn't
look like they're waiting for responses to those messages - so essentially
this might just be a (pretty good?) stream processing app.

~~~
derekcollison
It is not one big write, but optimizations around msgs/write using buffering
are used in clients, with obvious care to balance latency and throughput. In
the benchmark, the write buffer is 16k, so it is flushed automatically via
Go's bufio when it hits that mark, and then I flush it again when the loop is
complete, flushing the remainder of the outbound buffer. I then use a
PING/PONG, which is part of the NATS protocol, to only stop timing when the
PONG returns, and I know all messages have been processed. NATS does have a
verbose protocol flag that has all protocol frames ack'd with either +OK or
-ERR.

------
marshray
Not bad at all, that's approximately the message-passing overhead I measured
in C++ on a similar CPU a while back.

I think the main utility for such a benchmark though is to establish a lower
limit on theoretical per-message overhead. Any practical system is likely to
want to do something interesting with the content of the messages.

But this lets us say "expend an average of at least 5 us of useful computation
on each message in order to keep the overall cost of message passing below
10%".

~~~
derekcollison
That is correct, I am only attempting to measure the efficiency of the
messaging processing engine within the server. I did want to include the
network stack and the buffering portion, as well as the framing, protocol
parser, and the subject based routing.

------
jamwt
zeromq/czmq equivalent: <https://gist.github.com/4229625>

Does 3M+ messages a second over tcp/loopback in my test.

(The fact that this go is competitive is pretty sweet.)

------
jlouis
Slowly, go is getting into more and more places. This is yet another nice
replacement. It is probably going to be way faster than the Ruby
implementation in the long run.

------
paulasmuth
Mh, if I understand correctly the messaging system being tested is
<https://github.com/derekcollison/nats> and it is not written in Go but in
Ruby (+EventMachine). Or is there a Go version of NATS?

Also, this is only testing the time it takes the go client to write the
messages to the socket, not the time the server takes to process the messages.
So the benchmark would be the same with a noop server that reads and discards
all incoming traffic. Am I wrong?

~~~
timtadh
I am not the author of the gist but quoting from it: "our work on a high
performance NATS server in Go." Furthermore, the tests he shows are run using
Go's testing tool (which tests go code). Finally, he discusses "no use of
defer" a Go language feature not in Ruby. So yes I believe it is an
implementation in Go.

~~~
randomdata
I agree with your analysis, but EventMachine does use a _defer_ method to push
long-running work off of the event loop.

~~~
4ad
Defer in Go means something very specific and completely different:

<http://golang.org/ref/spec#Defer_statements>

<http://golang.org/doc/effective_go.html#defer>

~~~
randomdata
But EventMachine#defer would make sense in this context, given that the Ruby
version is written with EventMachine. That is why I noted it.

------
jasonmoo
I'm curious to see if this is running with GOMAXPROCS above 1. I've seen the
scheduler start to drag down reqs/sec with more than one thread in lightweight
networking services like this.

~~~
derekcollison
It is not, but its on my TODO list, and I have also observed similar behavior
at times. I have taken care to make sure the synchronization is efficient, but
running the test is needed to get the real results.

------
perplexes
What does NATS achieve that isn't available by using RabbitMQ or 0MQ?

------
onetwothreefour
This isn't impressive if it's in memory. Which it most probably is.

~~~
bascule
It seems like if you spent two seconds reading the description in the gist
you'd clearly see it wasn't.

2 million messages per second is a pretty respectable number, and fairly
comparable to systems like distributed Erlang and Akka.

~~~
gtani
the shootout says

[http://shootout.alioth.debian.org/u64q/performance.php?test=...](http://shootout.alioth.debian.org/u64q/performance.php?test=threadring)

~~~
igouy
about thread-switching not PubSub

------
stevewilhelm
I had the privilege of working with Derek "back in the day." He has worked on
some pretty impressive mission critical, low latency message systems.

------
bryogenic
NATS is new to me, are there interfaces for anything other than Ruby, Go,
Node, and Java? C or Python? Any other resources about it? Thanks!

~~~
majke
NATS[1] is a relatively simple pub-sub protocol and broker created by Derek
Collison. It's originally written in Ruby. It was created as the messaging
layer of Cloud Foundry.

From the technical side - routing is done using regular expressions, the speed
of internal routing is proportional to number of different subscriptions.

Additionally, NATS have some interesting specific features - for example: if I
remember correctly messages for slow consumers are just dropped.

NATS was originally a monolithic server, but I see that some work on
clustering had been done [2].

[1] <https://github.com/derekcollison/nats> [2]
<https://github.com/derekcollison/nats/wiki/Cluster-Design>

------
michaelfeathers
The language isn't the important thing:
<http://martinfowler.com/articles/lmax.html>

------
segmond
~2 M msgs/second means nothing without telling us about the specs of the
system.

