It's worth bearing in mind that messages per second is important, but it's easy to get fixated on benchmark porn.
Different queueing systems have different guarantees, disciplines and semantics. These affect user-facing behaviour, which tends to be more important at first blush than throughput.
I like zillions of messages per second as much as the next fellow. But frequently you need to worry about things like:
* Are messages delivered once and only once?
* Are messages are messages delivered at least once?
* Can messages be dropped entirely under pressure? (aka best effort)
* If they drop under pressure, how is pressure measured? Is there a back-pressure mechanism?
* Can consumers and producers look into the queue, or is it totally opaque?
* Is queueing unordered, FIFO or prioritised?
* Is there a broker or no broker?
* Does the broker own independent, named queues (topics, routes etc) or do producers and consumers need to coordinate their connections?
* Is queueing durable or ephemeral?
* Is durability achieved by writing every message to disk first, or by replicating messages across servers?
* Is queueing partially/totally consistent across a group of servers or divided up for maximal throughput?
* Is message posting transactional?
* Is message receiving transactional?
* Do consumers block on receive or can they check for new messages?
* Do producers block on send or can they check for queue fullness?
And there's probably a bunch more I've forgotten.
The thing is that answers to these questions will fundamentally change both the functional and non-functional nature of your queueing system.
For example, a queue system giving best-effort, unordered, non-durable behaviour is going to run a lot faster. It also pushes a lot of work onto the application programmer. On the other hand, once-and-only-once, durable, consistent queues are lot slower and screech to a halt under most partition conditions. But they also fit what most application developers expect to happen upon the first encounter with queueing systems.
I work on a section of Cloud Foundry in my day job, and other teams have seen that different tasks require different queueing approaches.
For example, stuff like metrics is is still useful under conditions of dropping messages, out-of-order messages and so on, because what's interesting is the statistics, not any one single measurement.
But a message like "start this app" requires much higher guarantees of ordering, durability, delivery certainty. People get mad if your PaaS doesn't actually run the application you asked it to run.
So, just remember: queues are not queues. You need to compare delivered apples with lossy oranges.
As a note, the author observes that MQTT provides an option to select which delivery semantics you prefer (at-least-once, at-most-once / best-effort, once-and-only-once), but I can't see which one the benchmark is run for.
I'm at Labs in NYC on secondment to Cloud Foundry. You should come see us and do a tech talk!
> "Is queueing partially/totally consistent across a group of servers or divided up for maximal throughput"
I didn't do a good job of explaining this.
Basically, assuming the most popular queueing discipline -- FIFO -- you can either set up brokers to be highly available, or to scale approximately linearly, but not both.
This is because HA + FIFO + at-least-once queueing requires servers to coordinate the state of a queue and to either write to disk or replicate messages. It's really, really hard and it can ruin your day when there's a partition.
If you relax all the guarantees that make life easier for application developers, you can just send any old message to any old server. No server needs to coordinate with any other and so scaling is closer to linear.
Amazon's SQS is a really good case study in what queues with looser guarantees can achieve.
Thanks a lot for further specifying the cross-consistency guarantee. Basically the system I'm developing is a lot more like Amazon's SQS, it gives guarantees about message durability / replication, but only provides weak ordering, for three reasons: 1) What you said, scalability. 2) Availability since the system is available into the minority partition as well. 3) My queue is fundamentally designed with the idea that messages must be acknowledged by consumers, or they are reissued after a job-specific retry time (that can be set to 0 if you really want at-most-once delivery, but it's a rare requirement). When you have auto-re-queue strict ordering is basically useless since it is violated every time a consumer does not acknowledge the message fast enough.
However not strict ordering does not mean random, so it tries to approximate a FIFO using each node's wall clock timestamps. This means that while messages may be delivered in random order, usually what is queued first is served first, which is no guarantee at all from the POV of the developer, but is a more general guarantee about the fact that if there are N users waiting into a web application for some thing to get processed, the first in queue will likely be served before the last.
p.s. sorry for errors, writing with my daughter pulling my arms :-)
+ Latency vs throughput.
What would be really informative (and this is not remotely directed only at Jian Zhen's work here) is to see metrics across a spectrum per a given modality per a standard topology and capacity. This way it would be almost clear at a glance at which approach is the best fit for a given domain.
Also I wonder if the time has come for the software geeks to visit their fellow geeks in the Mech-E departments and check out the work to date on flow analysis … ;)
Tyler Treat who wrote an excellent blog post on MQ performance tested this and got mean latency for 1000000 messages of 98.751015 ms.
Have you ever read this? http://zedshaw.com/archive/programmers-need-to-learn-statist...
But if, for example, you wanted to send it into SIGMETRICS for publication, they'd expect a whole bunch of extra work and background material.
One thing that's tricky in our profession is that the space of all possible configurations and inputs is gigantic and we only tend to sample the very, very small parts of it that occur to us.
My dream would be to build a framework which orchestrates testing across distributed senders/receivers. That would give you a much more accurate representation of performance and reliability instead of a single machine.
Thanks for the detailed response. Here's some answers hopefully can help clarify things a bit.
A: MQTT allows QoS 0 (at most once), 1 (at least once), and 2 (exactly once.) The performance numbers in the blog are for QoS 0.
However, SurgeMQ implements all three and there's unit tests for all three. I just haven't done the performance tests for QoS 1 and 2.
SurgeMQ supports it though the numbers posted are for QoS 0 (at most once)
No. Currently no messages are dropped.
Not sure what this means..sorry..
Ordered, FIFO...MQTT spec requires that messages from publishers to delivered in the same order to the subscribers.
Brokers uses topics to route. Publisher publishes to a topic, subscribers subscribe to multiple topics w/ optional wild cards.
Ephemeral currently. Though MQTT spec requires that any unack'ed QoS 1 and 2 messages be redelivered when the server restarts or client reconnects. So once SurgeMQ meets that spec, it could be considered somewhat durable.
Currently SurgeMQ is a single server w/ no clustering ability. However, MQTT spec does mention the bridge capability that is a poor man's cluster. Not yet implemented.
QoS 1 and 2 starts to be more transactional. QoS 0 is strictly fire and forget.
Block on receive.
Block on send.
I wasn't expecting to make you fill out a survey. I was just trying to show off scars I and others I know have accumulated over the years :)
> Not sure what this means..sorry..
I'm definitely an amateur on this topic, but I suppose he talks about privacy issues. If you're delivering secrets you don't want that every actor can read the messages of all the others.
Some queues let you "peek". It's not common because it weakens the whole concept of a queue and tends to be difficult to implement sanely.
Disclaimer: RabbitMQ belongs to Pivotal, the same company I work for.
There's really no comparison tbh. RabbitMQ has been battle tested for years and SurgeMQ is just at its infancy. And RabbitMQ has a lot more enterprise security features that SurgeMQ doesn't have today. So if you are looking for a solution today, go w/ RabbitMQ.
SurgeMQ hopefully will get there someday, but it's not ready today, yet.