
Benchmarking Message Queue Latency - tylertreat
http://bravenewgeek.com/benchmarking-message-queue-latency/
======
brobinson
Would love to see NSQ as part of this.

------
calpaterson
What sensible scenarios are there for sending 1MB messages over a message bus?

~~~
justinsaccount
What is your idea of the arbitrary threshold at which point one should no
longer use a message queue?

~~~
brobinson
I don't know of a good threshold, but you should generally favor small
messages -> big computations. If a 1mb message isn't handled properly, it
could be retransmitted, etc. In the case of a 1mb message, it sounds like that
data should be serialized into a database and the id or whatever is needed to
access it is passed along in the message instead.

It's like emailing a 10mb powerpoint instead of throwing it on Dropbox and
sending a link to it.

------
dpw
A few things about the article that made me think "hmmmm":

No mention of testing set-up. Was the test client running on a different
machine from the server? What kind of machines? What kind of network?

Many of the charts have the same "ballooning" shape, despite measuring very
different systems. I think this is due to the "attempt to correct coordinated
omission by filling in additional samples". As I understand it, all charts but
the first have this correction applies (and it does sound like it is applied
by manipulating the data, not by altering the measuring method). To understand
the effect this might have, imagine testing a system that has a single request
queue by making requests on a regular schedule, say at 1ms intervals. And most
of the time, these take much less than 1ms. But one request is an outlier and
takes 100ms. What will the "corrected" results look like? The worst case will
by 100ms. The second worst case will be 99ms. The third worst case will be
98ms, etc. On a linear horizontal scale, this would give us a linear slope at
the right hand side of the chart. Change to a logarithmic horizontal scale,
and you get a chart with the shape seen in many of the charts in this article.
This makes it impossible to tell whether the worst cases are due to a small
number of outliers or not. I believe that the correction is well-meaning, but
I think the uncorrected results would be more informative.

The use of line charts is a bit odd. They are connected to the origin, which
is obviously a fiction. They are also slightly smoothed - where steps are
visible, the steps have a gradient rather than being a vertical line. Where
the number of data points is low, this leads to odd effects: In the two 1MB
charts, the right third of the chart is just showing the value of a single
data point! A scatter plot might give the reader a more honest impression.

The logarithmic horizontal scale of those charts tends to focus attention on
the worst cases. That's not unreasonable - in some contexts, that's what you
really care about. But outliers might occur due to environmental effects like
kernel scheduling, VM scheduling, dropped packets on a noisy network etc.,
unless you make an effort to prevent such things. And it makes it very hard to
see the typical values on the charts for RabbitMQ and Kafka where the range of
Y values is large. Can you tell what the median latency for RabbitMQ/Kafka for
any message size is? It looks like about 0.5ms to me, but it's hard to read it
from any of the charts.

The number of messages involved is different for different message sizes. You
can see that from the way the 1MB charts are stepped, but the charts for
smaller message sizes are smoothed. For 1MB messages, it look like there are
5k or 10k samples on the charts. For the smaller message sizes, probably far
more. Were all the tests run for roughly the same amount of time? Tests run
for longer might see more outliers due to the environment.

"The 1KB, 20,000 requests/sec run uses 25 concurrent connections". With the
implication that other test runs had different levels of concurrency. So what
were they? What was the impact of changing the concurrency levels while the
message size/rate was constant?

Is it possible that the client program making the measurements was introducing
any artefacts (for example, being written in Go, did it encounter any GC
pauses?). It would be interesting to see the the results of measurements
against a simple TCP echo server, as a control.

My criticisms may seem too harsh. It is too much to expect someone to expect
weeks doing rigorous measurements, and the resulting article would be so long
that hardly anyone would read all of it (sounds like academia!). Someone might
say that I should do my own experiments if I think I can do them better; but I
have a day job too. I don't want to discourage the author; I think it is good
that the author did the work he did, and put it up for everyone to see. But
when articles like this get linked on HN and read by lots of people, they can
easily get regarded as conclusive. Ideas about the performance of various
projects get established that might not be well-founded and can take years to
dispel. So all I'm saying is, reader beware!

