

A Fast, Simple, Queue Built on MongoDB  - BenjaminCoe
http://blog.attachments.me/post/9712307785/a-fast-simple-queue-built-on-mongodb

======
rbranson
Developer aversion to learning new tools is astounding sometimes. Background
workers and message queues have to be the most reinvented wheel of all time.
99% of the time, something free and great will already do what you want, and
they've thought of things you haven't. Why not redirect some of this energy
into using a tool like Celery[1]? Surely it's simpler to learn a tool than it
is to build a new one from scratch?

* What happens when job's crash, fail, or get stuck?

* What about when the machines running the workers die or are isolated from the network?

* What happens if the backing store dies?

* How do you start, stop, restart, grow, and shrink your worker pool?

[1] [http://ask.github.com/celery/getting-
started/introduction.ht...](http://ask.github.com/celery/getting-
started/introduction.html)

~~~
tptacek
Your point is well taken, but doesn't Celery want RabbitMQ? That's an awful
lot of machinery for what is (in many apps) a very small problem.

We get to look at lots of people's Rails apps, and it appears that Resque is
emerging as the de facto answer to this problem. A nice attribute of Resque is
that for all the machinery Redis provides, it _also_ solves the "memcache"
problem and is probably a better utility player than $X-MQ.

If you have an app that already wants $X-MQ, Celery sounds very sensible. And
having apps that want $X-MQ is a good thing, too.

And, not reinventing this wheel also makes sense.

~~~
ctide
I was going to ask about Resque as well. There's also a port of it (I don't
know how well maintained it is, but it exists) that uses mongo as the backend
instead of redis if you want to avoid adding another platform to your app.

------
josephruscio
Perhaps I'm assuming too much, but it looks like you might be confusing
_notification_ with _queueing_. As you found with SQS, queueing systems can
make for poor notification mechanisms. Did you try using AWS SNS
(<http://aws.amazon.com/sns/>) for your first two use cases? We have similar
notification use-cases internally, and SNS has worked great.

~~~
BenjaminCoe
I don't know that this is quite what I'm looking for, a worthwhile read
though. Never ceases to amaze me how many things Amazon can prefix with
Simple.

------
xal
Doing a queue system where the workers wait in a busy-loop is pretty insane.

I'm saying that as the author of Delayed::Job so i'm pretty much responsible
for those shenanigans.

~~~
LeafStorm
Out of curiosity, what strategy would you recommend instead? Something like
Redis' B[LR]POP, where the blocking can be handled on the network layer? Or a
different strategy entirely?

~~~
Todd
If you put this behind, say a REST API vs using MongoDB directly, the clients
could use long polling.

If you look at Twitter's Kestrel, it was implemented with a similar mechanism,
albeit leveraging some capabilities that were intrinsic to memcached.

I built a little queuing prototype using long polling and it works very well.
If you build it using an async server, you can run many clients while only
using a few threads.

There's still the question of how to block and/or wake on an enqueue event to
MongoDB vs. polling. The best approach may be to implement queueing using your
REST API, leveraging all of the advantages of the async server and use MongoDB
for a backing store or journal. This is similar to how Kestrel does it.

My prototype was very responsive, although I only got about 500 messages per
second of throughput on my MBA.

~~~
BenjaminCoe
I see what you're getting at. Behind the scenes, I am ultimately using an
event loop :) Having said that, since MongoDB is a persistent connection, does
long pulling over HTTP make sense for the client?

I was thinking of doing a proof of concept in Node, that would at least behave
like it's getting pushed messages rather than pulling ... But, I couldn't
think of a way other than repeated setTimeouts.

Any thoughts?

~~~
Todd
Node would work well since it's designed to support async patterns. You're
right that the challenge is the polling model that you have to implement when
using MongoDB as the queue. To get throughput, you may have to use it as a
journal and implement the queue within your process. Check out Kestrel on
GitHub for an example.

