
Celery – Best Practices - denibertovic
https://denibertovic.com/posts/celery-best-practices/
======
xenator
In many projects Celery is overkill. Common scenario I saw:

    
    
      1. We have problem, lets use Celery
      2. Now we have one more problem.
    

I found [http://python-rq.org/](http://python-rq.org/) much more handy and
cover most cases. It uses redis as query broker. Flask, Django integration
included [https://github.com/mattupstate/flask-
rq/](https://github.com/mattupstate/flask-rq/) [https://github.com/ui/django-
rq](https://github.com/ui/django-rq)

~~~
kapkapkap
Thanks for this, I had considered using celery for a recent project but
ultimately backed away because I got the feeling it was more trouble than it
was worth. As a point of reference would you say the learning curve for a
celery setup is similar to that of django? Not that theres anything terribly
hard about django, but Id agree that its probably overkill if youre relatively
new to python and are just looking for a quick way to produce some html with
no intent on developing it further.

~~~
goblin89
I wouldn't say Celery's learning curve is steeper than Django's, but it
definitely seems like overkill for your case. If you need to do some time-
consuming action periodically (and making an HTTP request by hand each time is
not an option), then you could just use cron for the start if your project is
relatively simple. And if you literally need to just produce some HTML when
asked for, then why are you considering using an async task processor such as
Celery?

~~~
kapkapkap
Oh no I would never use celery for that. It was just comparing the learning
curve of celery to django.

------
mickeyp
Good, basic practices to follow. Here's a few more:

\- If you're using AMQP/RabbitMQ as your result back end it will create a lot
of dead queues to store results in. This can easily overwhelm your RabbitMQ
server if you don't clear these out frequently. Newer releases of Celery will
do this daily I think - but it's worth keeping in mind if your RMQ instance
falls over in prod.

\- Use chaining to build up "sequential" tasks that need doing instead of
calling one after another in the same task (or worse, doing a big mouthful of
work) in one task as Celery can prioritise many tasks better than
synchronously calling several tasks in a row from one "master" task.

\- Try to keep a consistent module import pattern for celery tasks, or
explicitly name them, as Celery does a lot of magic in the background so task
spawning is seamless to the developer. This is very important as you should
never mix relative and absolute importing when you are dealing with tasks.
from foo import mytask may be picked up differently than "import foo" followed
by "foo.mytask" would resulting in some tasks not being picked up by Celery(!)

\- Never pass database objects, as OP says, is true; but go one step further
and don't pass complex objects at all if you can avoid it. I vaguely remember
some of the urllib/httplib exceptions in Python not being serializable and
causing very cryptic errors if you didn't capture the exception and sanitise
it or re-raise your own.

\- Use proper configuration management to set up and configure Celery plus
what ever messaging broker/backend. There's nothing more frustrating than
spending your time trying to replicate somebody's half-assed Celery/Rabbit
configuration that they didn't nail down and test properly in a clean-room
environment.

~~~
yen223
With regards to #1: What happens is that if task_B depends on a value that
task_A returns, task_A will insert its value into the queue and task_B will
consume it.

If task_C returns a value which no other task cares about, it will insert the
value into the queue, and never gets consumed. This is why dead queues (also
known as "tombstones") happen.

Always remember to set ignore_result=True for tasks which don't return any
consumed value.

EDIT: "Tombstones", not gravestones

~~~
denibertovic
In general using an AMQP for the result storage is somewhat of a bad idea i
think. But yes I agree about the ignoring results part seeing as most tasks
I've seen in the wild don't return anything at all. Hence #6 in the post.

~~~
yen223
Good advice.

Do note that if you use the chord pattern
([http://celery.readthedocs.org/en/latest/userguide/canvas.htm...](http://celery.readthedocs.org/en/latest/userguide/canvas.html#chords))
anywhere, you must set ignore_result=False

------
sylvinus
I've worked 4+ years with Celery on 3 different projects and found it
incredibly difficult to manage, both from the sysadmin and the coder point of
view.

With that experience, we wrote a task queue using Redis & gevent that puts
visibility & tooling first:
[http://github.com/pricingassistant/mrq](http://github.com/pricingassistant/mrq)

Would love to have some feedback on that!

~~~
john2x
Looks interesting. Can't find any links to the docs?

~~~
bduerst
I can't find any either - it looks like mrq is a front-end dashboard for
python-rq: [http://python-rq.org/](http://python-rq.org/)

~~~
sylvinus
MRQ is heavily inspired by RQ to which we switched from Celery
([http://www.slideshare.net/sylvinus/why-and-how-pricing-
assis...](http://www.slideshare.net/sylvinus/why-and-how-pricing-assistant-
migrated-from-celery-to-rq-parispy-2))

However it is a complete rewrite because we felt we couldn't add gevent
support and other features to provide extreme visibility without major
changes. If you don't need those 2 things, you may want to check out RQ
instead for now, it's still a very good piece of software.

~~~
bduerst
Awesome!

I see now that mrq supports concurrency why python-rq does not (at least in a
stable fashion).

I'll try mrq for the gevent integration. It's great that you guys are actively
working on improving it. Python-rq is great too, but it hasn't been updated in
a while and I don't think concurrency is on the radar.

------
misiti3780
I would add:

1\. Use task specific logging if you have a bunch of task:
[http://blog.mapado.com/task-specific-logging-in-
celery/](http://blog.mapado.com/task-specific-logging-in-celery/)

2.Use statsd counters to keep track of basic statistics (counts + timers) for
each task

3\. Use supervisor + monit to restart workers after lack of activity (I have
seen this happen a few times, but never been able to track down why it
happens, but this is an easy fix)

~~~
denibertovic
More awesome tips. Thank you.

------
ehurrell
Excellent resource, I remember wrestling with learning celery and how to do
some simple things, loved finding Flower to monitor things.

I will say though Celery is probably overkill for a lot of tasks people think
to use it for, in my case it was mandated to support scaling for a startup
that never launched, partly because they kept looking at new technologies for
problems they didn't have yet.

------
waffle_ss
I disagree with the characterization in #1 (although I can't speak to the
Celery particulars). I feel like if you have a job that is critical to your
business process, the job should be persisted to your database and created
within the same database transaction as whatever is kicking off the job.

Consider how background jobs are typically managed with RabbitMQ, Redis, etc.
They are usually created in an "after commit" hook from whatever gets
persisted to your relational database. In this scenario, there is a gap
between the database transaction being committed and the job being sent to and
persisted by RabbitMQ or Redis; during this gap the only record of that task
is being held in a process's memory.

If this process gets killed suddenly during this gap, that background job will
be lost forever. It sounds unlikely, but if RabbitMQ or Redis is down and the
process has to sit and retry, waiting for them to come back online, the gap
can be sizable.

~~~
jaegerpicker
I disagree with this, in my experience it's almost always a really bad idea to
use the DB as a queue. If rabbitmq is down the process should retry a finite
amount of times (usually 3 in our use case) then set a status on the db
record. Then you have audits running to pick up records in that state and
retry the process once the system is back up and running. That way nothing is
lost and you gain all of the benefits of Rabbitmq.

~~~
queuesaredbs
_bad idea to use the DB as a queue_

Not according to Jim Gray. See "THESIS: Queues are Databases"[1][2]

1-
[http://research.microsoft.com/apps/pubs/default.aspx?id=6849...](http://research.microsoft.com/apps/pubs/default.aspx?id=68494)

2- (pdf)
[http://research.microsoft.com/pubs/69641/tr-95-56.pdf](http://research.microsoft.com/pubs/69641/tr-95-56.pdf)

~~~
denibertovic
Tnx for the link. nice read. There is a difference though, in using something
as "a Queue" and using something as an AMQP implementation when it's clearly
not that.

------
keosak
Points 1 and 2 are only valid because the Celery database backend
implementation uses generic SQLAlchemy. Chances are, if you are using a
relational database, it's PostgreSQL. And it does have an asynchronous
notification system (LISTEN, NOTIFY), and this system allows you to specify
which channel to listen/notify on.

With the psycopg2 module, you can use this mechanism together with select(),
so your worker thread(s) don't have to poll at all. They even have an example
in the documentation.

[http://www.postgresql.org/docs/9.3/interactive/sql-
notify.ht...](http://www.postgresql.org/docs/9.3/interactive/sql-notify.html)

[http://initd.org/psycopg/docs/advanced.html#async-
notify](http://initd.org/psycopg/docs/advanced.html#async-notify)

~~~
denibertovic
It is true that Postgres supports Pub/Sub but unfortunately the Celery broker
driver does not take advantage of this. It would be great if we could get
support for it. Nevertheless, just because it has pub/sub doesn't mean it's a
full AMQP implementation. Also, there's the fact that most amqp solutions are
in memory, wheres a database is on disk... also has it's costs.

~~~
denibertovic
Anyone that's interested in Postgres's pub/sub might find this useful:
[https://denibertovic.com/talks/real-time-
notifications/#/](https://denibertovic.com/talks/real-time-notifications/#/)

Just slides though. Haven't gotten around to writing a post about it yet.

------
TwistedWeasel
Once you scale your worker pool up beyond a couple of machines you need some
sort of config management with Celery. We use SaltStack to manage a large pool
of celery workers and it does a pretty good job.

~~~
denibertovic
Indeed. I use Ansible myself.

------
TomaszZielinski
This is not a Celery-specific tip, but as Celery also likes to "tweak" your
logging configuration you can use
[https://pypi.python.org/pypi/logging_tree](https://pypi.python.org/pypi/logging_tree)
to see what's going on under the hood.

~~~
natedub
You can disable Celery's automatic logging configuration by connecting a
listener to the setup_logging signal.

[https://celery.readthedocs.org/en/latest/userguide/signals.h...](https://celery.readthedocs.org/en/latest/userguide/signals.html#setup-
logging)

Of course, logging_tree is a great tool as well!

~~~
TomaszZielinski
Take a look at
[https://github.com/celery/celery/blob/v3.0.23/celery/utils/l...](https://github.com/celery/celery/blob/v3.0.23/celery/utils/log.py#L250)
\- it's an older version that I once checked but it seems to be patching
loggers unconditionally (i.e. outside any signal handler).

------
geertj
I've been looking at Python tasks queues recently. Does anyone have experience
on how Celery and rq stack up?

Rq is a lot smaller, more than 10x by line count. So if it works just as well,
I'd go with the simpler implementation.

~~~
xenator
I used both, ended with Rq. Freedom if choice can be good, but when you able
to make decision. Variety of backends, storages force you to understand how
each component really work and when you dig into details you find that they
all not equivalent. But you just need something f--kng working and you don't
want to pay another guy to maintain zoo of different products.

That is why I decided to use Rq, it is better to know limitations of something
simple then know possibilities but not able to make choice.

~~~
geertj
That's very helpful, thanks!

~~~
asksol
There are many differences, but most notably rq spawns one process per task.
Line count is a stupid metric, e.g recently our line count doubled because of
our new coding style, also the majority of the source code is tests.

------
zentrus
Passing objects to Celery and not querying for fresh objects is not always a
bad practice. If you have millions of rows in your database, querying for them
is going to slow you way down. In essence, the same reason you shouldn't use
your database as the Celery backend is the same reason you might not want to
query the database for fresh objects. It depends on your use case of course.
Passing straight values/strings should be strongly considered too since
serializing and passing whole objects when you only need a single value is not
good either.

~~~
denibertovic
Oh absolutely values before objects. I said "serializing" more in the sense
that pickle is always used for storing the arguments into the queue (or
whatever the default serializer).

It always depends on your use-case but generally you want your application to
behave correctly, which means it has to have correct/fresh data...you can't
sacrifice correctness because of an inability to scale your database.

~~~
zentrus
Yes. I think saying "you can't sacrifice correctness because of an inability
to scale your database" is perhaps conveying the wrong message though. I mean,
your very first point is about database scaling issues and the advantages of
using something like RabbitMQ to avoid expensive SQL queries.

If you are processing a lot of data in Celery, you really want to try to avoid
performing _any_ database queries. This might mean re-architecting the system.
You might for example have insert-only tables (immutable objects) to address
this type of concern.

~~~
denibertovic
Fair point. I agree with this.

------
TomaszZielinski
If you combine Celery with supervisord it's important to check the official
config file[1]. At least two settings there are really important -
`stopwaitsecs=600` and `killasgroup=true`. If you don't use them you might end
up with a bunch of orphaned child Celery processes and your tasks might be
executed more than once.

[1]
[https://github.com/celery/celery/blob/ee46d0b78d8ffc068d5b80...](https://github.com/celery/celery/blob/ee46d0b78d8ffc068d5b80e9568a5a050c61d1a8/extra/supervisord/celeryd.conf#L18)

------
Eric_WVGG
Am I the only person who was genuinely disappointed that this wasn’t about the
vegetable?

It’s a sadly under-rated ingredient! The flavor is subtle but unmistakable.

~~~
denibertovic
Sorry to disappoint. Flower isn't about a real flower either. :P

------
oulipo
Wondering about something: if you need to have a long task (5s to 10s) in the
background, or even longer, for an AJAX request, what should you rather do:

\- use gevent + gunicorn, or Tornado, in order to keep a socket open while the
worker is processing the task?

\- use polling? (less efficient)

\- use websockets (but then the implementation is perhaps a bit more complex)

can you do this simply using Flask?

~~~
denibertovic
Hmm, seems we're talking about 2 things here.

If your ajax request requires long task processing and requires you to wait
for it than this is not a background task any more, it's done in one of the
web server threads, and even if the thread outsources the task to another
process it's still waiting on that proces to finish before returning the ajax
response. This is bad.

I'm not entirely convinced about websocket solutions in Python yet, but I've
been told flask-websockets is awesome. Nevertheless this doesn't solve the
problem for you. Cause the request is just keeping an open line and waiting
for a respone....blocking is bad.

The most simplest advise I would have is to have the ajax request trigger a
background task and return immediately. The background task will then have
some kind of side effect (ie. write some result to a database somewhere) which
the ajax request can the look for with some kind of polling mechanism (on some
other endpoint). Of course you can complicate this a lot, depending on your
needs, but this seemed like the most straightforward solution.

~~~
zo1
" _I 'm not entirely convinced about websocket solutions in Python yet, but
I've been told flask-websockets is awesome. Nevertheless this doesn't solve
the problem for you. Cause the request is just keeping an open line and
waiting for a response....blocking is bad._" Tornado only blocks if you do
something silly. It's event based, and can keep hundreds of connections open
and waiting for it's async response event before actioning/responding the open
connection.

" _The most simplest advise I would have is to have the ajax request trigger a
background task and return immediately. The background task will then have
some kind of side effect (ie. write some result to a database somewhere) which
the ajax request can the look for with some kind of polling mechanism (on some
other endpoint)._ " Wow, overkill much? Polling is bad, and is exactly the
kind of _bad solution_ that a lot of these libraries are in place to prevent
developers from needing to do.

Websockets were made to solve the long-polling and poll-spamming that was
prevalent. Now all you have to do is keep a light, open web-socket connection
to the server. And the server, being async/evented, will respond when the task
is good and ready. Nice and clean.

~~~
denibertovic
"Tornado only blocks if you do something silly. It's event based, and can keep
hundreds of connections open and waiting for it's async response event before
actioning/responding the open connection." \- Yes pure tornado based apps are
probably fine if you know what you are doing.

"Wow, overkill much? Polling is bad, and is exactly the kind of bad solution
that a lot of these libraries are in place to prevent developers from needing
to do." \- Polling is not bad if you have a good use case. You just cannot do
non-blocking stuff with Django for instance, or it's very very hard and
tricky. Websockets also limit you with the number of connections you can have
open at once.

------
mataug
What about using Redis as a celery backend ? Redis has a pub sub mechanism
which seems quite reliable, so no need to poll.

~~~
denibertovic
Redis is still not an AMQP, but yes Redis's Pub/Sub works quite nicely. Out of
all the brokers celery supports I'd recommend only RabbitMQ and Redis to
people.

~~~
mataug
Yeah, I've been using redis with celery in production to perform lots of
network io related tasks on a low end machine because

a > Redis uses less memory

b > Redis is easier to setup

~~~
denibertovic
With container solutions like Docker and prebuilt images the setup part is
kinda eliminated. Although I don't remember having any special configuration
issues with RabbitMQ as well, it just works TM. That's always nice right? :)

------
harlowja
As one the authors of taskflow I'd like to give a little shout-out for its
usage (since it can do similar things as celery, hopefully more elegantly and
easily).

Pypi:
[https://pypi.python.org/pypi/taskflow](https://pypi.python.org/pypi/taskflow)

Comments, feedback and questions welcome :-)

------
stickperson
I've heard so much about Celery but still have no clue when it would be used.
Could someone give some specific examples of when you have used it? I don't
really even know what a distributed task is.

~~~
denibertovic
A background task is just something that's computed outside of the standard
http request/response process. So it's asynchronous in the sense that the
result will be computed sometime in the future, but you don't care when.

Distributed just means that you can have your task processing spread out
across multiple machines.

A specific example would be, let's say, after your user registers on your
website for the first time you wan't to get a list of all his facebook/twitter
friends. This action will take a long time and is not vital to the whole
registration/login process so you set a task to do that later, and let the
user proceed to the site and not make him look at the spinner the whole time,
and when the friend list becomes available it will show up on the website (on
his profile or whatever). Makes sense?

------
DrJ
I'd also add: Be wary of context dependent actions (e.g. render_template,
user.set_password, sign_url, base_url) as you aren't in the
application/request context inside of a celery task.

------
peedy
Has anybody been able to make a priority queue (with a single worker) in
celery?

Eg, execute other tasks only if there are no pending important tasks.

~~~
jordonwii
The FAQ question isn't very clear about it, but it doesn't look like it's
possible: [http://celery.readthedocs.org/en/latest/faq.html#does-
celery...](http://celery.readthedocs.org/en/latest/faq.html#does-celery-
support-task-priorities)

------
zrail
Small typo where you define `CELERY_ROUTES`. `my_taskA` should probably have
the routing key `for_task_A`, right?

~~~
denibertovic
Not really, that's just the name of the actual task itself ie. "def
my_taskA(a, b, c)".

~~~
zrail
This line `'my_taskA': {'queue': 'for_task_A', 'routing_key': 'for_task_B'},`
"for_task_B" should be "for_task_A" to match the CELERY_QUEUES definition.
Unless I'm misunderstanding what you're doing, of course.

~~~
denibertovic
Ah no, you got that one perfectly I just didn't understand what was meant at
first. Fixed. Thank you.

------
stefantalpalaru
> when you have a proper AMQP like RabbitMQ

AMPQ = Advanced Message Queuing Protocol so it's wrong to say that a message
broker is "an AMQP". Also, give Redis a try - it's much easier to set up and
uses fewer resources.

We should probably talk about the elephant in the room when addressing
newbies: the Celery daemon needs to be restarted each time new tasks are added
or existing ones are modified. I got past that with the ugly hack of having
only one generic task[1] but people new to Celery need to know what they're
getting into.

[1]:
[https://github.com/stefantalpalaru/generic_celery_task](https://github.com/stefantalpalaru/generic_celery_task)

~~~
malinoff
Let me repeat, you don't need this to load/reload tasks. There is
'pool_restart' broadcast command[1].

[1]:
[http://docs.celeryproject.org/en/latest/userguide/workers.ht...](http://docs.celeryproject.org/en/latest/userguide/workers.html#pool-
restart-command)

