
Adventures in message queues - fcambus
http://antirez.com/news/88
======
ChuckMcM
In my experience there are three things that will break here;

1) At-most-once is a bridge to an elementary school which has an inter-
dimensional connection to a universe filled with pit vipers. Kids will die,
and there is nothing you can do to stop it.

2) Messages are removed when acknowledged _or_ memory pressure forces them to
be kicked out. Black Perl messages, those that sail in from out of nowhere,
and lonely widows (processes that never find out their loved ones are dead)
will result.

3) Messages are ordered using wall clock millisecond time. This will leave
your messages struggling to find their place in line and messages that should
be dead, not be dead (missing fragment problem).

Obviously all these are simply probabilistic trade-offs based on most likely
scenarios which result in arbitrarily small windows of vulnerability. No
window is small enough at scale over time.

Often when these things have bitten me it has been non-programming stuff. For
example a clock that wouldn't follow NTP because it was too far ahead of what
NTP thought the time was, an operator fixed that by turning time back 8
seconds. A client library that was told messages arrive at most one time, and
so made a file deletion call on the arrival of a message, a restored node
holding that message managed to shoot it out before the operator could tell it
that it was coming back from a crash, poof damaged file. And one of my
favorites in ordering, a system that rebooted after an initial crash
(resetting its sequence count) and getting messages back into flight with the
wrong sequence number but with legitimate sequence values. FWIW, these sorts
of things are especially challenging for distributed storage systems because
files are, at their most abstract, little finite state machines that walk
through a very specific sequence of mutations the order of which is critical
for correct operation.

My advice for folks building such systems are never depend on the 'time',
always assume at-least-once, and build in-band error detection and correction
to allow for computing the correct result from message stream 'n' where two or
more invariants in your message protocol have been violated.

Good luck!

~~~
politician
I don't understand the purpose of at-most-once semantics in practice: So,
you've got some process where you don't care if the message goes through, but
you're willing spend money on the compute/storage for it anyway? Why bother?

If you're designing a system with those semantics, is that because you're
hoping that it's exactly-once: 99.999% of the time -- wink, wink, nudge, nudge
so that your handlers don't have to be idempotent? What's the fallback plan
for the day that all of your messages go into a networking blackhole?

~~~
stevewepay
I want a message queue to hold credit card authorization requests. I want to
ensure that the authorization requests occur at most once, since having a
customer's credit card charged twice is infinitely worse than not charging the
card.

If the request never goes through, then I can always get the customer to try
again, but if it gets charged more than once, then it's a huge headache, and
our customers get extremely angry and lose confidence that we know what we're
doing.

~~~
ChuckMcM
That is certainly doable but you need to separate the _message_ semantics from
the _credit card_ semantics. So lets say your authorization includes a nonce
which is a blend of the transaction id data such that it is unique to the
transaction. Now you send that through the message system which delivers it
one or more times, but the _nonce_ identifies all of them as the same
transaction so the credit card processor tosses the 'extra' ones away, it has
already done the work. In such a system the message passing infrastructure
needs to be at-least-once and the protocol involved insures that nothing bad
happens if you deliver messages twice. Then you can route around damage or
non-functioning nodes.

The reasoning at the protocol level then goes, "If I have never seen this
nonce, then the transaction has not been authorized" (very durable, easy to
implement), and "A request seen multiple times will only be recorded once"
(also very durable and easy to implement).

But if the message infrastructure tries to guarantee at-most-once, and the
user writes their protocol, "if I see this authorization I should do it,
because the protocol will only send it to me one time." Then there will come a
time in that protocols future where something blew up the invariants in the
message infrastructure and the message arrives twice, and the wrong thing
happens on the credit card because the protocol trusts it to do the right
thing.

Back in the RPC wars people were arguing that it was so much "simpler" and
more "transparent" if a remote procedure call had the exact same semantics as
a local procedure call. And that is true it would be, if you could manage that
similarity, except it has been proven over and over again that network
messaging systems can't completely get it correct.

~~~
stevewepay
Interesting, thanks for the insight. We are doing something similar to the
nonce that you mention, but your argument that at-least-once semantics is more
appropriate is very convincing to me, as is your RPC argument. I'm old enough
to have had to endure both ONC and DCE rpcs :)

------
antirez
I'm very sorry, credits for the questions goes to Jacques Chester, see
[https://news.ycombinator.com/item?id=8709146](https://news.ycombinator.com/item?id=8709146)
I made an error cut&pasting the wrong name of Adrian (Hi Adrian, sorry for
misquoting you!). Never blog and go to bed I guess, your post may magically be
top news on HN...

------
pixelmonkey
Seems like a similar design to Apache Kafka,
[http://kafka.apache.org](http://kafka.apache.org). AP, partial ordering
(Kafka does ordering within "partitions", but not topics).

One difference is that Disque "garbage collects" data once delivery semantics
are achieved (client acks) whereas Kafka holds onto all messages within an
SLA/TTL, allowing reprocessing. Disque tries to handle at-most-once in the
server whereas Kafka leaves it to the client.

Will be good to have some fresh ideas in this space, I think. A Redis approach
to message queues will be interesting because the speed and client library
support is bound to be pretty good.

------
andrea_s
Maybe I'm missing something, but if it is important to guarantee that a
certain message will be dispatched and processed by a worker, why wouldn't a
RDBMS with appropriate transactional logic be the best solution?

~~~
Spearchucker
It would. There's an argument that something like only once, guaranteed and
in-order delivery are _business requirements_ which therefore have no place in
the message layer - [http://www.infoq.com/articles/no-reliable-
messaging](http://www.infoq.com/articles/no-reliable-messaging).

------
acolyer
Credit for the questions is due to jacques_chester, not me! See
[https://news.ycombinator.com/item?id=8709146](https://news.ycombinator.com/item?id=8709146)

------
turingbook
>a few months ago I saw a comment in Hacker News, written by Adrian
Colyer...was commenting how different messaging systems have very different
set of features, properties, and without the details it is almost impossible
to evaluate the different choices, and to evaluate if one is faster than the
other because it has a better implementation, or simple offers a lot less
guarantees. So he wrote a set of questions one should ask when evaluating a
messaging system.

I can not find the comment by @acolyer on HN. Who can help me?

~~~
discardorama
I think Salvatore mis-remembered. The comment was by @jacques_chester (he made
a joke about it below, but people didn't get it). The comment is in this
thread (HN doesn't let me link directly):
[https://news.ycombinator.com/item?id=8708921](https://news.ycombinator.com/item?id=8708921)
. Look for Jacque's comment in there, and Salvatore replied to it.

I also wanted to find the original, and used Jacque's remark below as a
starting point to start hunting for it.

~~~
antirez
Sorry Jacques, Adrian. This great (IMHO) set of questions was written by
Jacques indeed.

------
caf
I wonder what the point is in having "best effort FIFO"? If the application
has to be able to deal with unordered messages anyway, you might as well not
bother to try to maintain any kind of order.

It's as well to be hung for a sheep as for a lamb.

~~~
ekimekim
In general, you want to consume messages "fairly" ie. in a way that minimizes
latency introduced by the queuing. Best-effort ordering gives you this _most_
of the time, which is better than _none_ of the time.

~~~
antirez
Exactly, imagine you use your message broker in order to coordinate a web app
that needs to apply effects to photos in a web interface. It is definitely a
good idea that users arriving before are served before, but violating this by
a couple of milliseconds is not going to change the experience.

------
mappu
Ask HN: I'm in the market for a distributed message queue, for scheduling
background tasks -

Does anything support "regional" priorities, where jobs are popped based on a
combination of age + geographic/latency factors?

Also, what are recommended solutions for distributing job injection? My job
injection is basically solely dependent on time, and so i envisage one node
(raft consensus?) determining all jobs to inject into the queue.

My queue volume is about 50 items/sec and nodes will be up to 400ms apart.

~~~
Rapzid
That sounds like a job for a scheduler, in the cluster sense. I'm actually
looking around for a job framework in .NET that supports custom schedulers,
but have yet to find something that supports resource constrained scheduling.
It's all either about en-queueing something to be done NOW, or at a future
date. I haven't seen anything that supports custom scheduler implementations
on a per-job type basis. They don't really distinguish between logging work to
be done and deciding whether or not it can be executed NOW.

------
isb
This looks very cool. At-least once semantics are the way to go because most
tasks require idempotence anyway and that helps in dealing with multiple
delivery. Strict FIFO ordering is not always needed either as long as you
avoid starvation - most of the time you need "reliable" deferred execution
("durable threads").

I started prototyping something along these lines on top of riak (it is
incomplete - missing leases etc but that should be straightforward to add):
[https://github.com/isbo/carousel](https://github.com/isbo/carousel) It is a
durable and loosely FIFO queue. It is AP because of Riak+CRDTs. It is a proof
of concept - would be nice to build it on top of riak_core instead of as a
client library.

------
jtchang
When I first installed Redis years ago I was astounded at how easy it was to
get up and running. Compare this to the plethora of message brokers out there:
the vast majority you will spend the better half of the day trying to figure
out how to configure the damn thing.

My overall impressions with message brokers is that RabbitMQ is a pain in the
ass to setup, celery is my go to these days with beanstalkd being a close
second if I don't want too many of celery's features.

~~~
arturhoo
Agreed, I've used beanstalkd in multiple projects/companies and its
simplicity, robustness and especially stability has always impressed me.

There are client drivers for a wide range of languages including Python and
Ruby (and also a ActiveJob compatible library).

Installing it is simple as downloading the source, running make and then the
binary. [http://kr.github.io/beanstalkd/](http://kr.github.io/beanstalkd/)

~~~
pentium10
We use Beanstalkd actively to process 50M emails per month, and it does a
great job. We never had any issues with it, and setup is very easy, you have a
nice admin console panel to help you in development and monitor your tubes
[https://github.com/ptrofimov/beanstalk_console](https://github.com/ptrofimov/beanstalk_console)

------
sylvinus
FYI, Salvatore will speak at dotScale in Paris about Disque on June 8:
[http://dotscale.io](http://dotscale.io)

------
rdoherty
This has me excited for many reasons. Redis is amazingly powerful, robust and
reliable piece of technology. Also I love reading antirez's blog posts about
the decisions behind Redis so I can't wait to learn more about queueing
systems from him when discussing Disque.

------
bcg1
This looks like a good effort, congratulations.

Personally I'm torn on the usefulness of generic brokers for all
circumstances... there are obvious advantages, but at the same time every
messaging problem scales and evolves differently so a broker can quickly
become just one more tail trying to wag the dog.

I am also interested in the architecture of tools like ZeroMQ and nanomsg,
where they provide messaging "primitives" and patterns that can easily be used
to compose larger systems, including having central brokers if that floats
your boat.

------
jraedisch
We recently switched from RabbitMQ to Redis queuing because we were not able
to implement a well enough priority queue with highly irregular workloads.
Prefetch would not work since 2 minute workloads would block all following
messages. Timeout queues would somewhat rebalance msgs, but large blocks of
messages would be queued at the same time and therefor be processed as large
blocks. Now our workers are listening to 10 queues/lists with different
priorities with BRPOP and so far everything seems to work.

------
latch
Unodered, in-memory queues shouldn't be anyone's goto solution. I think
there's a time and place for these, and having at-least-once delivery is a
huge win over just using Redis, so I'm excited.

Still, unless you know exactly what you're doing, you should pick a something
with strong ordering guarantees and that won't reject messages under memory
pressure (although, rejecting new messages under memory pressure is A LOT
easier/better to handle than dropping old messages).

------
jpfr
Some big project are currently making the switch to DDS-based pub/sub. [1,2]

Now that everybody is making QoS guarantees in pub/sub and message queues, is
there a real difference to the 10 year old tech deployed in boats, trains and
tanks?

[1] [http://www.omg.org/spec/DDS/1.2/](http://www.omg.org/spec/DDS/1.2/)

[2]
[http://design.ros2.org/articles/ros_on_dds.html](http://design.ros2.org/articles/ros_on_dds.html)

~~~
kweinber
Anyone personally work on a DDS project is in actual production? I have never
even seen one although I've worked in some of the industries where it is
supposedly a success.

------
arunoda
I think this has a lot of roots from NSQ. But, NSQ has no replication support.

I think built in replication is very nice to have. Would like to try once this
arrives.

~~~
politician
NSQ writes messages sent to a topic onto all channels subscribing to the topic
which can be used as a form of replication in a way that meshes well with its
at-least-once semantics.

~~~
arunoda
But, that all happens inside one NSQ. I mean what if that NSQ server goes
down.

We don't have that kind of guarantee in NSQ. To get rid of that, we need to
maintain the replication our own by publishing the message to two different
NSQ servers.

~~~
mnutt
True, but with NSQ being brokerless, the lines are somewhat blurred between
client and server.

Each of our app servers has its own local nsqd. If that nsqd stops responding,
we take the app server out of the load balancer. The local nsqd publishes to
multiple channels, which get consumed by hosts in different datacenters.

There are still potential failure cases: nsqd loses connection to consumers,
messages start building up, then the machine somehow goes away. The only way
to prevent it is to take the machine out of the load balancer any time nsqd
messages start backing up, but we prioritize serving requests over sending
messages.

~~~
arunoda
Yes. I didn't want to rant our NSQ. This is the reason we choose NSQ because
of it's simplicity.

We maintain NSQ close to the consumer. We don't our publisher to take
responsibility to the message processing. Once it's push to the queue, we need
to make sure it's getting process anyway.

That's why we publish same message to multiple queues. All our DB operations
are idempotent. So, we are okay with processing the same message multiple
times.

------
jacques_chester
I wish to assure all and sundry that Adrian Colyer is not my secret crime-
fighting identity, and vice versa :)

~~~
antirez
Sorry Jacques :-( Fixed... and thank you again for your comment.

~~~
jacques_chester
Hey, it gave me chance to crack a lame joke. I love those.

------
Lx1oG-AWb6h_ZG0
Will there be any way to set up machine affinity? I think Azure Service Bus
uses this mechanism (by specifying a partition key for a message) to enable
strict FIFO for a given partition.

------
andrewstuart
I didn't see any mention of dead letter queues. Does it support dead letters?
This is an extremely useful feature of Amazon SQS.

------
X-Istence
This reminds me of a talk at SCALE13x about NATS:
[http://nats.io](http://nats.io)

It's fast and scaleable.

~~~
stevewilhelm
This might be why they sound similar:
[http://blogs.vmware.com/tribalknowledge/2010/03/vmware-
hires...](http://blogs.vmware.com/tribalknowledge/2010/03/vmware-hires-key-
developer-for-redis.html) and
[https://github.com/derekcollison?tab=repositories](https://github.com/derekcollison?tab=repositories)

------
aaa667
Is guaranteed at-most-once delivery impossible?

~~~
antirez
Exctly-once is impossible. Example, you build all the system to be totally
consistent and transactional (a CP system basically). Then you deliver the
message to the client, and it dies without to report you if the message was
acknowledged or not. You have two options:

1\. Re-issue the message. If you do that, it is possible that the now crashed
client already processed it, and you end with multiple delivery.

2\. Drop the message. If you do that, the client maybe did not processed the
message, and you end with no delivery.

~~~
avar
Exactly-once is possible in practice if you effectively bring the client under
the umbrella of your transactions. E.g. you can (ab)use MySQL and other RDBMs
to do something like:

1\. Have a table where each row is a "job" or "message" and has a status which
when enqueued is status = "unclaimed". You can also have a `last_changed`
column.

2\. Have workers consuming that table that GET_LOCK() a row and set status =
"processing" and hold the lock for the duration of the processing.

3\. When they're finished with the task they update status = "finished" and
unlock the row (or equivalently, disconnect).

This requires much tighter coupling between the queue and the queue consumer
(each client must maintain a lock / connection for the duration of processing
items).

But it means that:

* Nothing will ever pick up the item more than once due to the GET_LOCK().

* If the consumer dies the item is either just unlocked, or the status is "processing". You can as a matter of policy re-pick up "processing" with a `last_changed` in the distant past, alert on those items and manually inspect them.

* If the consumer processes the item successfully it'll set the status to "finished" and nothing will process the item again.

Now obviously this requires a lot more overhead & maintenance than what you
have in mind, in particular it makes some of the failure cases be "the item is
delayed due to a consumer dying and won't be processed until a human looks at
whether it was actually finished".

But this is the sort of pattern that you might use e.g. if you have a queue
for sending out invoices. At work we use MySQL +
[https://metacpan.org/pod/Data::Consumer::MySQL](https://metacpan.org/pod/Data::Consumer::MySQL)
to do this.

~~~
antirez
Even considering the client to be part of the distributed system in an
"active" way, cooperating for the single delivery goal, the part I don't
believe is reasonable is: "In particular it makes some of the failure cases be
"the item is delayed due to a consumer dying and won't be processed until a
human looks at whether it was actually finished".

Moreover, what you describe here is more like: at most once delivery with
delivered messages log so that non acknowledged entires can be inspected. I
don't see how this really qualifies as exactly once. I guess exactly once must
be , eventually, honored automatically even in the face of failures to
qualify.

Isn't it just better to use an at-least-once queue, (trans)actions-unique-IDs,
and a CP store as the source of truth for the current state? So you turn all
the sensible operations into idempotent ones.

~~~
avar
Right, you don't _have_ to implement it like that, but the point is that the
job is guaranteed to be in one of these states:

* Hasn't been picked up yet * Has been picked up, and currently has something processing it (because the lock is still there) * Has been picked up, but whatever picked it up has gone away * It's finished

Handling the jobs that haven't been picked up yet or are finished is easy. But
what do you do about the jobs where the processor has simply gone away?

Well, if they still hold the lock you can give them some more time, and if
they don't hold the lock presumably they died.

At that point you can decide how you want to handle that, do you just have
something re-queue those jobs after some set amount of time has passed, or
does someone manually look at the jobs and decide what to do?

As an example of something we use this for: You might have some transactional
E-Mails to be sent out, each recipient is a row in the queue table, you have
to generate an E-Mail and pipe it to sendmail, just before you shell out to
sendmail you mark the item as "processing", then you depending on the sendmail
return value you either mark the item as processed in the queue or re-queue
it.

There's obviously a race condition here where sendmail might have returned OK
to you but the DB server you're talking to blows up (so you can't set the
status as "finished"). No amount of having unique IDs in external systems is
going to help you because that'll just create the same problem. I.e. you have
some state outside of your system that needs to be manipulated exactly once,
and once that's done you'd like to mark the job as done.

In practice once you get the things that can fail between "processing" and
"finished" down to some trivial logic this sort of thing is _really_ reliable
as an "exactly once" queue. To the extent that it fails once in a blue moon
you can usually just manually repair it, i.e. in this case see what your mail
logs say about what mails you sent out.

Redis obviously doesn't have the same strong storage guarantees as a disk-
backed RDMBs, but we also have a version of exactly this logic that runs on
top of Redis sentinel:
[https://metacpan.org/pod/Queue::Q::ReliableFIFO::Redis](https://metacpan.org/pod/Queue::Q::ReliableFIFO::Redis)

It has the same queued/in-progress/finished queues using Redis lists, things
are moved between the lists atomically and processed by workers, but of course
if a worker crashes and burns at the wrong time you're stuck with some items
in in-progress state and have to somehow figure out what to do with them. I.e.
either you blindly re-queue them (at least once) or manually see whether they
finished their work and re-queue them as appropriate (exactly once).

------
JohnLen
How does zero MQ stand up

~~~
lobster_johnson
ZeroMQ is not a message queue, nor is it a broker. You could build a message
queue with it, but it's not really a valid comparison.

~~~
JohnLen
I wasn't aware it's not a broker. Thanks for clearing that up.

~~~
stingraycharles
It's fairly trivial to build an in-memory broker with zmq, though, but you
should rather consider zmq as building blocks for a message queue, or sockets
on steroids.

It fills a need for certain applications, and it is very good at what it does.

------
andrewstuart
It would be nice to have a message queue system not built in Erlang or Java.

~~~
bcg1
[http://zeromq.org/](http://zeromq.org/)

~~~
lobster_johnson
ZeroMQ is not a message queue. It's in the name: It has zero message queue
capabilities. It does implement socket primitives you can use to build a queue
broker, but afaik no one has.

~~~
bcg1
[https://github.com/zeromq/malamute](https://github.com/zeromq/malamute)

~~~
lobster_johnson
Thanks. Looks unfinished. No persistence yet, according to the author.

