
Distributing Work Without Celery - craigkerstiens
http://justcramer.com/2012/05/04/distributing-work-without-celery/
======
shazow
I built something similar when I was working on SocialGrapple (processing
hundreds of social graph aggregation and processing jobs every hour), called
turnip: <https://github.com/shazow/turnip>

It uses SQLAlchemy to generate a relational schema for the queue, so it works
on most relational databases. I used it with PostgreSQL.

One additional twist is that it supports scheduling and recurring jobs using
cron-like syntax. Otherwise it's the same idea, minus the automatic worker
scaling. The schema stores a reference to the job to call with desired
parameters, then you can spawn as many turnip workers you want to consume.

Sadly I've stopped using it since I stopped working on SocialGrapple, so it's
not maintained anymore. But the code is simple enough and I'm happy to add
more documentation if anyone is interested.

------
benatkin
Here's another python library with a job queue that doesn't depend on
RabbitMQ. It instead depends on Redis, which many have found easier to grasp.
I don't know how mature it is.

<http://blog.thoonk.com/> <https://github.com/andyet/thoonk.py>

~~~
zeeg
We also use Thoonk at Disqus, but this and a normal queue are definitely
trying to solve different problems.

------
heretohelp
I used to work for a company that had a similar problem (hundreds of millions
or billions of 'potential' work items).

We ended up hacking up a well-tuned MySQL table into behaving as a work queue
that would distribute batches of work via an update query that also acted as
an atomic lock on the "in-flight" work items.

Not really the way you're supposed to do it, but hey, it was a helluva lot
easier than trying to make a RabbitMQ instance, or cluster for that matter,
retain the state of a billion rows. Ludicrous.

Message queues are really designed for just that, low-latency message
dispatch. Long-standing work to be done just isn't feasible with the design of
most MQs.

I'd love to see some shake-n-bake scalable database -> worker libraries crop
up so that I don't have to hack up something like that godawful MySQL table
again.

And before you ask, no, MapReduce isn't appropriate for this. The tricky part
is the race conditions and the retention of state long term, not the
distribution/parallelization of work.

Another aspect that proved interesting was keeping the MySQL instance from
getting eaten alive by workers asking for jobs to perform.

EDIT: Re: 'shake-n-bake libraries', I'm considering hacking up what I have in
mind, if nothing else but to inspire some better talented coders to create a
framework or library that serves this need.

~~~
ericmoritz
Kestrel: <https://github.com/robey/kestrel>

~~~
heretohelp
Not strongly ordered enough for what I want. Cool though.

~~~
mattdeboard
I feel like if the problem for which the module in the OP was a regular,
recurring one like you've described, Storm[1] seems like it would fit. It uses
Zookeeper to manage state and maintains strong ordering[2].

1\. <https://github.com/nathanmarz/storm/wiki> 2\.
[https://github.com/nathanmarz/storm/wiki/Transactional-
topol...](https://github.com/nathanmarz/storm/wiki/Transactional-topologies)

