With AMQP (specifically, RabbitMQ) I can set up a fanout topology such that a publisher simply publishes to an exchange. Clients then bind their queues to this exchange based on a routing key, and any messages matching the key will be copied there. If no clients have bound, messages disappear into the aether.
For example, the publisher can use the exchange "changes" to write messages with keys such as "create.post.123" or "update.user.321". A consumer can then bind a queue to the exchange with the filters "create." and "update." to get all create and update messages. A different app can listen to just ".user." to get anything related to users.
This is a really powerful tool, allowing publishers to be dumb firehoses and clients to surgically pick what they need. Of course you don't need exchanges and bindings to accomplish this, only topics.
To everyone who thinks this sounds scary, don't worry. You can bind dedicated "Undelivered queues" and "Dead letter" queues to exchanges to make sure that when routing fails, you don't lose any messages.
We're using RabbitMQ in a few newer projects and it's really a joy to work with!
What does Disque offer that's different?
BTW, thanks for Redis. It's one of my favourite pieces of software ever.
Edit: NVM, found this: http://antirez.com/news/88
- Queueing media files to be encoded/transcoded.
- Trigger a build of your application from a web interface.
Basically anything that you want to do where you might not want the web server to block until the entire task is done.
"Let's return as soon as possible rather than blocking" is not the real way to handle race conditions. All you are doing is reducing the surface (time) that such a thing can happen in. It makes more sense to plan these things out using something else (e.g. locking).
You can do some sort of websocket-y mechanism to tell the client that the server has completed a previous request, or polling, but both of those bring a lot of infrastructure overhead.
Look, in both cases, the request needs to be synchronous. You need to tell the customer that their purchase went through. If you're going to add a whole other bus into the thing and make it asynchronous and have an elaborate polling solution on the client, that's fine, but I would (and do) just have them wait the two extra seconds and either say "your card has been charged" or handle the rare case with "there has been an upstream error, please don't close this window while we retry the request".
- Improving response latency by moving deferrable tasks out of the page serving code
- Improving response latency by simplifying implementation of concurrent tasks (whether across cores or servers)
- Allowing capacity planning for average load instead of peak load for deferrable tasks
The bigger story is that message queues make a good substrate for building fault-tolerant, highly available, scalable distributed systems that are easy to reason about and have predictable performance. This is because system components can be cleanly decoupled, implemented with redundancy and scalability, the message queue itself takes over much of the complexity of failover and delivery/ordering guarantees, and the message queue makes an excellent point for monitoring the system, giving visibility into logjams and problems in a live system.
Why not just use RabbitMQ or Redis? Well, RabbitMQ is, in my experience, complex and fragile. It's got a ton of different features, which means you have to configure it to do anything beyond just the basics, and its management tools are somewhat lacking (why rabbitmqctl and rabbitmqadmin?). I recently started switching to Redis, because Redis is pretty much plug and play. It just works, and even minimal configuration that is necessary for this use case is very clear and simple. Moreover, it's got a very simple API for examining what's going on, no complex permissions management, vhosts, etc. It's only downside? It's not a message broker; it just has some of the right primitives to act like one.
This is not to say that RabbitMQ or Redis are bad. They are great for what they do. I simply don't want to use them as the backend for Celery for the reasons stated above.
rabbitmqctl is the core tool to interact with a node. It can show the cluster status, status of the node process etc., stop and start the app and so on. It works at a lower level.
rabbitmqadmin comes with the Management plugin, and is a client for the plugin's HTTP API. You have to enable the plugin to expose that API. The Management plugin adds some overhead, I believe (it samples statistics continually to serve through the API), and as a result it's optional. Not everyone would want or need to enable it.
rabbitmqctl does the basics, whereas rabbitmqadmin is a higher-level tool.
which api are you referring to specifically?
sudo rabbitmqctl -p example list_queues name messages messages_ready messages_unacknowledged consumers | grep 'celery \|other_celery '
It will be recorded and posted here as soon as possible ;-)
This is different enough from Redis that I don't see how they directly compete. Of course, if you already have a system setup and no performance constraints with what you have, of course you shouldn't change.
However there are clones like Shoryuken  that would be easier to port.
Things I can say: the ruby disque library isn't really fleshed out yet. It's alpha, so that's fine, but it's got a ways to go. For example, it doesn't directly expose ACKJOB as a command. The server is equally alpha, things like HELP don't do anything yet, and some options on some commands appear broken. But hey, it was a fun way to spend an hour.
I built a failover system for it involving NFS mounts and heartbeat, but having it all be automatic would be quite nice. Looking forward to a prod-ready version.
From the README:
DIStributed QUEue but is also a joke with "dis" as negation (like in disorder) of the strict concept of queue, since Disque is not able to guarantee the strict ordering you expect from something called queue. And because of this tradeof it gains many other interesting things.
By resource allocation problem, I mean that jobs may be small (so that lots of them can occur in parallel) or large (occupying a significant fraction of a machine's CPU, memory, bandwidth, whatever), and may be mixed together. Trying to do too much can effectively crash a system with OOM killer or paging.
Does everybody just roll their own resource-based scheduler?
There's a certain amount it can do, but ultimately, you have to apply backpressure at some point unless you're willing to start losing messages at the message broker layer.
After that, there's still a lot you could do: spool them to disk on the publisher, retry later, etc. You still risk message loss, filling up THOSE disks, etc... but again, something has to eventually give.
My first exposure to the resource allocation problem and solutions came at Google, whose system evolved into Kubernetes (now an open source project). It's so damn effective I hope it takes off everywhere.
We have this exact problem at the startup where I work. We have a home-made solution (a combo of Akka and Play for API / admin UI) that works OK, but really we'd prefer not to be in the business of writing job queues and schedulers.
Something big and cluster-oriented like Hadoop isn't a good fit; we typically only have one or two actual machines servicing jobs, because of our business model. Financial entities don't like their data being mixed with other people's data, so we give everybody tiny little networks of VMs, and can't farm work out to a giant cluster.
Redis is used as the backing storage for our homemade job queue. But without resource monitoring and allocation, Disque doesn't help with our problem. I'm sure it'll find much use elsewhere, though.
I suppose development on it has slowed to a crawl so it's no longer "actively" supported.
Some key differences are that beanstalkd offers (feature wise):
1. Bury / kick
3. Designed for speed
1. Designed for HA, scalable, distributed
2. Fault tolerant
3. Peek to multiple jobs
With the amount of momentum behind Redis and the fairly unknown beanstalkd, I guess Disque will gain popularity quite soon.
- disque: send to one, read from one. The message is handed off to N replicas, and efforts are made to avoid duplicated or dropped messages. Disque will refuse new messages if RAM is filled across the cluster.
- nsq: send to any. Read from all. NSQ nodes do not communicate or coordinate with each other. Since only one node originates the message, there's no duplication, but a node outage can drop messages, and a partition can isolate them with no consumer. NSQ can grow its queue beyond RAM, so it will keep accepting new messages even if it is partitioned from the consumer.
Personally I think NSQ's approach looks like it's doing a lot less work and achieving almost all the same guarantees.
I am basically using something I could fit in Redis + sync replication [in terms of data model / function] as a job queue presently so I suppose that is just where my mind jumps to.
I'll be interested in seeing comparisons vs other mq systems.
I don't know if this will follow the same philosophy, but I'll be keeping a close eye on this as it evolves.
RabbitMQ's clustering isn't great. It's sensitive to partitions, which can occur not just from actual network hiccups but also simply due to high CPU or I/O load, and it does not have a good strategy to recover from such partitions.
RabbitMQ is not multi-master by default. A queue is owned by a specific node, and if you have a partitioned cluster, that queue (and related objects such as exchanges and bindings) will disappear from other nodes.
You can patch RabbitMQ's clustering by enabling "high availability", which is their term for mirrored queues. Each queue will get a designated master, and be replicated to other nodes automatically. If a partition happens, the nodes elect a node to become a new master for a mirrored queue.
Unfortunately, this opens the cluster up to conflicts. Let's say you get brief partition. Now all the nodes see each other again, and you have conflicting queues: Node A used to be master of queue X, now node B is also master of queue X. During the split, their contents diverged a little bit. But RabbitMQ has no way to consolidate the two masters, so the queue is not operational.
To fix this, either you need to reconstruct the queue manually (usually impossible from an application's point of view), or wipe it (hardly a solution) or simply have RabbitMQ automatically pick a winning master and discard the other master(s). This mode is called "autoheal", and picks based on which master has the most messages; the previous master(s) are wiped and become slaves. This is coincidentally the only mode in which RabbitMQ can continue to run after a partition without manual intervention.
In practice, recovery has proved flaky for us. Nodes stay partitioned even after they should be able to see each other. We have also encountered a lot of bugs — for example, bindings or exchanges disappear on some nodes but not on others, or queues are inexplicably lost, or nodes otherwise just misbehave. We're on a cloud provider which is otherwise rock solid; of all the software (databases etc.) we employ in our clusters, RabbitMQ is the only one that misbehaves.
This is anecdotal, of course. Fortunately, the author of Jepsen, Kyle Kingsbury/"Aphyr", has done the maths to back this up, demonstrating that RabbitMQ's clustering is both theoretically and practically unsound .
This may be overly harsh. RabbitMQ is a decent project. RabbitMQ has a lot of features. Things like routing keys, flexible durability, TTLs and dead letter exchanges are great [†]. When it works, it works really well. But in the real world, I wouldn't want to run it more than two nodes, and preferably not at all.
[†] Although unfortunately DLXes are effectively unusable in a looping topology configuration (ie., for timed retries), as AMQP frames will increase indefinitely in size.
The dead-lettering process adds an array to the header of each dead-lettered message named x-death. This array contains an entry for each time the message was dead-lettered.
This table is never truncated. It exists in the AMQP frame and will grow indefinitely.
The AMQP spec mandates that clients negotiate a frame-max size per connection , and I believe RabbitMQ is in violation of the spec through this behaviour if the client specifies a non-zero value. (RabbitMQ even ignored this negotiation prior to 3.1, and there's a document  which indicates it ignores limits entirely.)
As a result, clients that follow frame-max strictly according to spec (such as amqplib for Node.js ) will refuse to handle violating frames, which is understandable given that clients also like to have predictable buffer allocation.
Now I've wrote my own wrapper to handle retries on the client side so I might give it a try.
- We have on peek 10k OPs/second.
- features we like is priority, TTR, pause tube (while deploying new code)
- responds really fast to stats (we have on a Stats/Graphite dashboard)
We love the Beanstalkd Admin Console panel. https://github.com/ptrofimov/beanstalk_console