Hacker News new | comments | show | ask | jobs | submit login

There's some typoes and broken links. I like the code, though, lots of work has been done.

It's worth bearing in mind that messages per second is important, but it's easy to get fixated on benchmark porn.

Different queueing systems have different guarantees, disciplines and semantics. These affect user-facing behaviour, which tends to be more important at first blush than throughput.

I like zillions of messages per second as much as the next fellow. But frequently you need to worry about things like:

* Are messages delivered once and only once?

* Are messages are messages delivered at least once?

* Can messages be dropped entirely under pressure? (aka best effort)

* If they drop under pressure, how is pressure measured? Is there a back-pressure mechanism?

* Can consumers and producers look into the queue, or is it totally opaque?

* Is queueing unordered, FIFO or prioritised?

* Is there a broker or no broker?

* Does the broker own independent, named queues (topics, routes etc) or do producers and consumers need to coordinate their connections?

* Is queueing durable or ephemeral?

* Is durability achieved by writing every message to disk first, or by replicating messages across servers?

* Is queueing partially/totally consistent across a group of servers or divided up for maximal throughput?

* Is message posting transactional?

* Is message receiving transactional?

* Do consumers block on receive or can they check for new messages?

* Do producers block on send or can they check for queue fullness?

And there's probably a bunch more I've forgotten.

The thing is that answers to these questions will fundamentally change both the functional and non-functional nature of your queueing system.

For example, a queue system giving best-effort, unordered, non-durable behaviour is going to run a lot faster. It also pushes a lot of work onto the application programmer. On the other hand, once-and-only-once, durable, consistent queues are lot slower and screech to a halt under most partition conditions. But they also fit what most application developers expect to happen upon the first encounter with queueing systems.

I work on a section of Cloud Foundry in my day job, and other teams have seen that different tasks require different queueing approaches.

For example, stuff like metrics is is still useful under conditions of dropping messages, out-of-order messages and so on, because what's interesting is the statistics, not any one single measurement.

But a message like "start this app" requires much higher guarantees of ordering, durability, delivery certainty. People get mad if your PaaS doesn't actually run the application you asked it to run.

So, just remember: queues are not queues. You need to compare delivered apples with lossy oranges.

As a note, the author observes that MQTT provides an option to select which delivery semantics you prefer (at-least-once, at-most-once / best-effort, once-and-only-once), but I can't see which one the benchmark is run for.




Outstanding comment. I'm working at a distributed queue and those are exactly the set of questions I ask myself when building different parts of the system, but your exposition is something I'm going to cut&paste and read every time I'm going to work to my project. Btw, we both work at Pivotal :-) Just a question: "Is queueing partially/totally consistent across a group of servers or divided up for maximal throughput". I'm not sure I understand this fully, could you please extend that a bit with some more context? Thanks.


> Btw, we both work at Pivotal :-)

I'm at Labs in NYC on secondment to Cloud Foundry. You should come see us and do a tech talk!

> "Is queueing partially/totally consistent across a group of servers or divided up for maximal throughput"

I didn't do a good job of explaining this.

Basically, assuming the most popular queueing discipline -- FIFO -- you can either set up brokers to be highly available, or to scale approximately linearly, but not both.

This is because HA + FIFO + at-least-once queueing requires servers to coordinate the state of a queue and to either write to disk or replicate messages. It's really, really hard and it can ruin your day when there's a partition.

If you relax all the guarantees that make life easier for application developers, you can just send any old message to any old server. No server needs to coordinate with any other and so scaling is closer to linear.

Amazon's SQS is a really good case study in what queues with looser guarantees can achieve.


I love NYC! That's tempting :-)

Thanks a lot for further specifying the cross-consistency guarantee. Basically the system I'm developing is a lot more like Amazon's SQS, it gives guarantees about message durability / replication, but only provides weak ordering, for three reasons: 1) What you said, scalability. 2) Availability since the system is available into the minority partition as well. 3) My queue is fundamentally designed with the idea that messages must be acknowledged by consumers, or they are reissued after a job-specific retry time (that can be set to 0 if you really want at-most-once delivery, but it's a rare requirement). When you have auto-re-queue strict ordering is basically useless since it is violated every time a consumer does not acknowledge the message fast enough.

However not strict ordering does not mean random, so it tries to approximate a FIFO using each node's wall clock timestamps. This means that while messages may be delivered in random order, usually what is queued first is served first, which is no guarantee at all from the POV of the developer, but is a more general guarantee about the fact that if there are N users waiting into a web application for some thing to get processed, the first in queue will likely be served before the last.

p.s. sorry for errors, writing with my daughter pulling my arms :-)


Total consistency requires global consensus and that puts a definitive break on performance.


Yep, but are we talking about consistency in the order of messages? That was not clear to me.


I principally was thinking of order, yes. When we application developers hear "queue" we typically assume it's going to have a FIFO discipline.


Thank you for covering this, numbers are great but it's the details that matter (and end up biting you in the ass). I have nearly a decade running and working with high-volume transactional message systems and it came up fairly regularly that Product A (WMQ-LL, 0MQ, Tibco FTL, 29West, or Solace) was faster than Product B (WMQ, AMQP- Rabbit, Active, Swift- or Tibco) ignoring recovery, transactionality, durability, reliability, ordering, and delivery (a favorite head scratcher is when you expect FIFO and get unordered). Matching the requirements to the system and setting expectations, especially expectations around how and when it goes wrong, is so much more important the benchmarks.


> And there's probably a bunch more I've forgotten

+ Latency vs throughput.

What would be really informative (and this is not remotely directed only at Jian Zhen's work here) is to see metrics across a spectrum per a given modality per a standard topology and capacity. This way it would be almost clear at a glance at which approach is the best fit for a given domain.

Also I wonder if the time has come for the software geeks to visit their fellow geeks in the Mech-E departments and check out the work to date on flow analysis … ;)


Excellent question regarding latency.

Tyler Treat who wrote an excellent blog post on MQ performance tested this and got mean latency for 1000000 messages of 98.751015 ms.

https://github.com/tylertreat/mq-benchmarking/pull/5



I have not. But I am going to assume you are telling me to learn more about how to calculate my performance numbers. :) Fair point and I will bookmark for future reading.


Good benchmarking requires a lot of work. What you did is actually useful within the narrow context you're applying it to -- quickly validating design changes.

But if, for example, you wanted to send it into SIGMETRICS for publication, they'd expect a whole bunch of extra work and background material.

One thing that's tricky in our profession is that the space of all possible configurations and inputs is gigantic and we only tend to sample the very, very small parts of it that occur to us.


Yeah, for the record, the benchmarks I've run aren't at all scientific. Should also be looking at percentiles. That said, surge looks promising so nice work, Jian :)

My dream would be to build a framework which orchestrates testing across distributed senders/receivers. That would give you a much more accurate representation of performance and reliability instead of a single machine.


Author here. A bit of a pleasant surprise to wake up and see this on HN.

Thanks for the detailed response. Here's some answers hopefully can help clarify things a bit.

* Are messages delivered once and only once?

A: MQTT allows QoS 0 (at most once), 1 (at least once), and 2 (exactly once.) The performance numbers in the blog are for QoS 0.

However, SurgeMQ implements all three and there's unit tests for all three. I just haven't done the performance tests for QoS 1 and 2.

* Are messages are messages delivered at least once?

SurgeMQ supports it though the numbers posted are for QoS 0 (at most once)

* Can messages be dropped entirely under pressure? (aka best effort)

No. Currently no messages are dropped.

* If they drop under pressure, how is pressure measured? Is there a back-pressure mechanism?

See above.

* Can consumers and producers look into the queue, or is it totally opaque?

Not sure what this means..sorry..

* Is queueing unordered, FIFO or prioritised?

Ordered, FIFO...MQTT spec requires that messages from publishers to delivered in the same order to the subscribers.

* Is there a broker or no broker?

Brokered

* Does the broker own independent, named queues (topics, routes etc) or do producers and consumers need to coordinate their connections?

Brokers uses topics to route. Publisher publishes to a topic, subscribers subscribe to multiple topics w/ optional wild cards.

* Is queueing durable or ephemeral?

Ephemeral currently. Though MQTT spec requires that any unack'ed QoS 1 and 2 messages be redelivered when the server restarts or client reconnects. So once SurgeMQ meets that spec, it could be considered somewhat durable.

* Is durability achieved by writing every message to disk first, or by replicating messages across servers?

N/A

* Is queueing partially/totally consistent across a group of servers or divided up for maximal throughput?

Currently SurgeMQ is a single server w/ no clustering ability. However, MQTT spec does mention the bridge capability that is a poor man's cluster. Not yet implemented.

* Is message posting transactional?

QoS 1 and 2 starts to be more transactional. QoS 0 is strictly fire and forget.

* Is message receiving transactional?

See above.

* Do consumers block on receive or can they check for new messages?

Block on receive.

* Do producers block on send or can they check for queue fullness?

Block on send.


Great reply.

I wasn't expecting to make you fill out a survey. I was just trying to show off scars I and others I know have accumulated over the years :)


Heh. No worries. Excellent list of questions. Like antirez, I am evernoting (is that a verb now?) this for future references.


I beleive by 'peeking into the queue' he means, is the only way to see the messages by receiving, or can you inspect the list of messages yet to be received without actually calling receive?


Ah..k...no peeking beyond just the head of the queue.


>> * Can consumers and producers look into the queue, or is it totally opaque?

> Not sure what this means..sorry..

I'm definitely an amateur on this topic, but I suppose he talks about privacy issues. If you're delivering secrets you don't want that every actor can read the messages of all the others.


Ah, I missed that one.

Some queues let you "peek". It's not common because it weakens the whole concept of a queue and tends to be difficult to implement sanely.


i was looking at rabbitmq - do you recommend a MQ that isnt a SaaS?


RabbitMQ is originally an installable package that can support a wide range of behaviours -- you still need to be mindful of which you choose. Depending on your settings, you can see orders of magnitude changes in throughput.

Disclaimer: RabbitMQ belongs to Pivotal, the same company I work for.


Author here.

There's really no comparison tbh. RabbitMQ has been battle tested for years and SurgeMQ is just at its infancy. And RabbitMQ has a lot more enterprise security features that SurgeMQ doesn't have today. So if you are looking for a solution today, go w/ RabbitMQ.

SurgeMQ hopefully will get there someday, but it's not ready today, yet.


Rabbitmq is AMQP and mostly an "enterprise-class" thing. MQTT is more of the IoT messaging protocol, don't think they're trying to do the same thing?


yea was just looking for an opinion - these guys know their stuff, i never get to leave the frontend so idk which mq to choose ;)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: