Hacker News new | past | comments | ask | show | jobs | submit login
Dissecting Message Queues (bravenewgeek.com)
79 points by swah on Aug 17, 2014 | hide | past | web | favorite | 15 comments

You separate your load generation tools from your system under test to eliminate confounding factors of shared resources like CPU, thread scheduler, memory allocator, disk, etc. This benchmark doesn’t do that, which means it’s impossible to distinguish between a queue system which is being saturated with work and a queue system which is out-competing the load generators for CPU. Also, it’s a laptop running OS X. Do you plan on fielding a queue system in a DC built out with Macbooks? No? Then this benchmark might as well be on a phone for all the inferential value it provides to people running Linux servers.

A single producer and single consumer means zero contention for either for most implementations. How well does this scale to your actual workload? What’s the overhead of a producer or a consumer? What’s the saturation point or the sustainable throughput according to the Universal Scalability Law? It’s impossible to tell, since this is a data point of one, which means it can’t distinguish between a system which has a mutex around accepting producer connections and a purely lock-free system. And that’s a shame because wow would those two systems have very different behaviors in production environments w/ real workloads.

Finally, measuring the mean of the latency distribution is wrong. Latency is never normally distributed, and if they recorded the standard deviation they’d notice it’s several times larger than the mean. What matters with latency are quantiles, ideally corrected for coordinated omission (http://www.infoq.com/presentations/latency-pitfalls).

This is not a benchmark, this is a diary entry.

Not only that, but there is no attempt made to gather metrics based on similar configurations / feature sets. The author specifically mentions that the AMQP systems persist their messages to disk by default, and this was the configuration used for the testing. How, then, are the "benchmarks" even comparable to the ephemeral message queues that don't provide any sort of persistence? Why wasn't persistence turned off to provide more comparable tests?

And why were nanomgs / 0mq even included?

I despise articles like this. They only serve to clutter up useful communication on such technologies.

Persistence was disabled. Nano and ZMQ are in their own group and aren't even attempted to be compared to other groups.

Do you know of any similar posts/links/sources that have more accurate & realistic benchmarks of the same software / messaging queues? That'd be immensely helpful...

We're attempting to optimize this aspect of our stack currently, and I'm sure many others face very similar challenges right now. It's proven to be quite difficult & time-consuming to accurately measure this stuff -- any insight into more accurate/reasonably realistic benchmarks of this type of MQ software would be awesome. :-)

I think to find out you'd need to at least measure on the same OS and hardware. A lot of things happen between the physical hardware and the kernel socket layer and those might be different between operating systems.

Some of the stuff is difficult and time-consuming because "messaging" is generic enough to be configured and used differently by different users.

Obviously you can cut away some of the choicesright of the bat if you are worried about support for some OS (Like you have to ship on HP UX), or you need to have durability and acknowledgement and high availability, or you want a project with a certain level of maturity and stability and so on. That cuts the number of systems to test.

Then of course there are things like, well how do they handle concurrency. Just because a single producer and single consumer can do 500K messages per second (which maybe a small benchmark on a co-workers laptop will show), doesn't mean that the whole thing won't blow up and crash in a burning mess if there are 1000 consumers and producers.

Kudos for summing that up nicely. Collecting an interpreting data for studies comes with great responsibility.

Yes, you're correct. "Benchmark" is a very unfortunate misnomer.


It's almost as if bad experimental design makes a blog article not worth reading.

Thanks for the link, do you have more of those?

> What's interesting, however, is the disparity in the sender-to-receiver ratios. ZeroMQ is capable of sending over 5,000,000 messages per second but is only able to receive about 600,000/second.

IIRC the sent messages get stuffed in a buffer in the sender process. If the sender process crashes it will take the already-sent-but-not-really messages with it.

My understanding is that the "foreground" thread gives the message to a background ZeroMQ thread, which then transmits them as fast as it can; therefore, this number is probably somewhat misleading, but I'm not sure how else you could measure it.

I might have missed this in the article, but were the brokered queues tested in or close to their default configuration? ActiveMQ in particular needs to be tweaked to get decent performance out of it in a production environment.

A little sad it doesn't cover gearman. Dead simple, easy to use, easy to get going, super flexible.

No discussion of MSMQ?

Almost no massively distributed applications are written on Windows, so i can see how it would be overlooked.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact