
RQ – Simple Job Queues for Python - gilad
https://github.com/rq/rq
======
pselbert
Redis is brilliant for simple job queues but it doesn’t have the structures
for more advanced features. Things like scheduled jobs can be done through
sorted sets and persistent jobs are possible by shifting jobs into backup
queues, but it is all a bit fragile. Streams, available in 5+ can handle a lot
more use cases fluently, but you still can’t get scheduled jobs in the same
queue.

After replicating most of Sidekiq’s pro and enterprise behavior using older
data structures I attempted to migrate to streams. What I discovered is that
all the features I really wanted were available in SQL (specifically
PostgreSQL). I’m not the first person to discover this, but it was such a
refreshing change.

That led me to develop a Postgres based job professor in Elixir:
[https://github.com/sorentwo/oban](https://github.com/sorentwo/oban)

All the goodies only possible by gluing Redis structures together through lua
scripts were much more straightforward in an RDBMS. Who knows, maybe the
recent port of disque to a plug-in will change things.

~~~
thejosh
Oban looks fantastic! PG is fantastic for a small-medium job queue IMHO.
pg_notify with elixirs postgrex is fantastic.

We're currently using ecto_job which works really well for us, so we have no
reason to switch. Plus we like it's in different tables.

~~~
pselbert
It seems like PG has been pigeonholed as only suitable for “small-medium” size
queues, but without numbers to define what “small-medium” means. A few million
jobs an hour is entirely reasonable for PG based on anectdata (and I’ve load
tested up to 54 million jobs an hour).

PG is an amazing tool that can handle more than most people think (or at least
more than I thought).

~~~
heavenlyblue
What about VACUUM? Did they add something to PostgreSQL to lower space
consumption?

~~~
pselbert
There has been some progress on that front. For index size there is a new
rebuild command in PG12 which works concurrently. There is also pluggable
storage now, which enables much more efficient row deletion. I can’t find a
link to the new storage format, but here is the announcement for 12
[https://www.postgresql.org/about/news/1943/](https://www.postgresql.org/about/news/1943/)

------
andrewstuart
Reposting from a while back in case it solved a problem for someone.

I use Postgres SKIP LOCKED as a queue. Postgres gives me everything I want. I
can also do priority queueing and sorting.

All the other queueing mechanisms I investigated were dramatically more
complex and heavyweight than Postgres SKIP LOCKED.

Here is a complete implementation - nothing needed but Postgres, Python and
psycopg2 driver:

    
    
        import psycopg2
        import psycopg2.extras
        import random
    
        db_params = {
            'database': 'jobs',
            'user': 'jobsuser',
            'password': 'superSecret',
            'host': '127.0.0.1',
            'port': '5432',
        }
        
        conn = psycopg2.connect(**db_params)
        cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
        
        def do_some_work(job_data):
            if random.choice([True, False]):
                print('do_some_work FAILED')
                raise Exception
            else:
                print('do_some_work SUCCESS')
        
        def process_job():
        
            sql = """DELETE FROM message_queue 
        WHERE id = (
          SELECT id
          FROM message_queue
          WHERE status = 'new'
          ORDER BY created ASC 
          FOR UPDATE SKIP LOCKED
          LIMIT 1
        )
        RETURNING *;
        """
            cur.execute(sql)
            queue_item = cur.fetchone()
            print('message_queue says to process job id: ', queue_item['target_id'])
            sql = """SELECT * FROM jobs WHERE id =%s AND status='new_waiting' AND attempts <= 3 FOR UPDATE;"""
            cur.execute(sql, (queue_item['target_id'],))
            job_data = cur.fetchone()
            if job_data:
                try:
                    do_some_work(job_data)
                    sql = """UPDATE jobs SET status = 'complete' WHERE id =%s;"""
                    cur.execute(sql, (queue_item['target_id'],))
                except Exception as e:
                    sql = """UPDATE jobs SET status = 'failed', attempts = attempts + 1 WHERE id =%s;"""
                    # if we want the job to run again, insert a new item to the message queue with this job id
                    cur.execute(sql, (queue_item['target_id'],))
            else:
                print('no job found, did not get job id: ', queue_item['target_id'])
            conn.commit()
        
        process_job()
        cur.close()
        conn.close()

~~~
pacala
What happens if the process dies while processing a job? Is said job going to
remain locked and unserviced until the end of times?

~~~
xyzzy_plugh
Once the client session holding the lock disconnects, closing the session, the
lock is released. You can also set client options like
idle_in_transaction_session_timeout to force transactions to close after a
certain amount of idle time, which also releases the lock.

~~~
nurettin
so the process that fails releases the lock marking the job undone which
causes another process to lock it?

~~~
xyzzy_plugh
It means another process could grab it, yes. In the parent post's model, the
"scheduler" operates in a "at least once" model", as unless the transaction is
committed, a premature failure would result in the message will be picked up
again by a scheduler, which could result in some duplicate work.

The parent post's model also tracks "attempts", but this only captures known
or acknowledged failures -- unacknowledged failures (i.e. crashes) will not be
recorded, so a task which explodes the process would be re-run ad infinitum.

An alternative method could be to record an attempt in a separate transaction,
so that the next scheduler execution can detect that the job/message was
serviced before, even if the message itself appears fresh.

------
andybak
Side niggle - I used to notice a lot of Django projects would use complex job
queues for absurdly low workloads. Beginners would get recommendations to use
RabbitMQ and Redis for sites that probably were only going to see a few
hundred concurrent users at most.

Seriously - don't add complex dependencies to your stack unless you need them.
The database makes a great task queue and the filesystem makes a great cache.
You really might not need anything more.

~~~
soperj
Yeah, that was my issue. Seemed like a ridiculous amount of stuff just to
process a few jobs once a day in the background. Eventually I found django-
background-tasks and have bene using that ever since. It uses whatever
database you're already using, so very quick and easy.

~~~
wojcikstefan
At a glance, [https://github.com/lilspikey/django-background-
task](https://github.com/lilspikey/django-background-task) indeed seems like a
good and easy way to get started. You can later switch to something more
specialized once you hit the right scale or when you need more sophisticated
features.

~~~
slig
Just FYI, this fork is updated [https://github.com/arteria/django-background-
tasks](https://github.com/arteria/django-background-tasks)

------
elamje
I currently use RQ. Here is the logic that lead me to choosing it, then
wishing I had just used Postgres.

I need a queue to handle long running jobs. I looked around the Python
ecosystem (bc Flask app) and found RQ. So now I add code to run RQ. Then I add
Redis to act as the Queue. Then I realized I needed to track jobs in the
queue, so I put them into the DB to track their state. Now I effectively have
a circular dependency between my app, Redis, and my Postgres DB. If Redis goes
down, I’m not really sure what happens. If the DB goes down I’m not really
sure what’s going on in Redis. This added undue complexity to my small app.
Since I’m trying to keep it simple, I recently found that you can use Postgres
as a pub/sub queue which would have completely solved my needs while making
the app much easier to reason about. Using Postgres will have plenty of room
to grow and buy you time to figure out a more durable solution.

~~~
elcomet
Can't you track state in a "stateless" way ? For example having the job write
it's progress in a file, so that if your app goes down, you can just re-read
the file and know the state of your job?

~~~
rcfox
Relying on the filesystem has its own drawbacks. If you scale horizontally,
the local filesystem might not have the file you created because it's running
on a different server. Mounting storage across multiple servers will introduce
bottlenecks, and now you have to worry about the storage going down. If you're
running within a Docker container, you need to make sure the file doesn't live
inside the container or it won't be persisted.

------
kureikain
I got burned by this with RQ.

Say you have an object, the object load some config from a settings module
which in turn fetch from env. No matter how many times I restarted RQ, the
config won't changed. Due to the code changes, the old config cause the job to
crash and keep retrying.

Until I got frustrated and get into Redis, pop the job, and yet all the
setting was in there. In other words, RQ serialize the whole object together
with all properties.

RQ isn't that good IMHO. You will have to add monitoring, health check, node
ping, scheduler, retrying, middleware like Celery eventually it grow into a
home grown job queue system that make it harder to on board new devs.

Just use Celery. Celery isn't that bloated. It has many UI to support it and
backend and very flexible. Celery beat schedule is great as well.

------
_verandaguy
I used this in a project at a previous job -- I have to say, while the API is
simple and useful enough for small projects, it raises some issues with how
it's designed.

Instead of relying on trusted exposed endpoints and just invoking them by URL,
it does a bytecode dump of task functions and stores those in Redis before
restoring them from bytecode at execution time.

This has a few drawbacks:

\- Payloads in the queue are potentially a fair bit larger for complex jobs \-
Serialization for stuff that has decorators (and especially stateful one, like
`lru_cache`) is not really possible, even with `dill` instead of `pickle` \-
It's not trivial, but this exposes a different set of security risks compared
to the alternative

I don't want to say this is a bad piece of software, it's super easy to set up
and way more lightweight than Celery for example, but it's not my tool of
choice having worked with the alternatives.

~~~
yamrzou
Which alternatives worked for you?

~~~
wakatime
Can't speak for op, but for us the ones based on RabbitMQ have performed well:

[https://github.com/Bogdanp/dramatiq](https://github.com/Bogdanp/dramatiq)

Stay away from Celery if you can, or stick to v3.2 because 4.x has a ton of
bugs.

~~~
scaryclam
Celery has always been a massive resource drain, and for pretty much zero
gain. I'd suggest just saying "stay away from Celery" as it's easy to get
going with, but really hard to scale or work with when you need what it
promises. There are better options.

------
harikb
May I ask anyone posting stats on "millions per day" please indicate how many
nodes/cpus the entire system uses. For example "8.6 million per day" is only
"100 per second". If that takes 100 cpus.... folks are underestimating what a
single node modern cpu/network card is capable.

~~~
wakatime
Task throughput isn't about the queue library. Your queue library should be
fast enough, after that the bottleneck is the workload your tasks are doing.
That's why you need prioritized queues so your slow long-running tasks run low
priority and don't block the high priority tasks that should be fast and
usually affect UX. We have 9 worker servers, each server has 6 CPUs, we
process over 1M tasks per day.

------
jholloway7
Not sure if lower-level API of RQ supports this, but I tend to prefer message-
oriented jobs that don't couple the web app to the task handler like the
example.

I don't want to import "count_words_at_url" just to make it a "job" because
that couples my web app runtime to whatever the job module needs to import
even though the web app runtime doesn't care how the job is handled.

I want to send a message "count-words" with the URL in the body of the message
and let a worker pick that up off the queue and handle it however it decides
without the web app needing any knowledge of the implementation. The web app
and worker app can have completely different runtime environments that
evolve/scale independently.

~~~
mattbillenstein
Agreed - which is why I don't put my business logic in the webapp, or use
models coupled to the web framework.

The web framework is a way to handle http, rest, or graphql - deserializing
and serializing those protocols, not a way to handle my business logic.

Decoupling these things lets you write one-off scripts or have task queues
that don't need to load the context of a large web framework - they can just
be simple python.

------
nerdbaggy
I am a big fan of RQ along with Huey
([https://github.com/coleifer/huey](https://github.com/coleifer/huey))

~~~
rsrx
I also liked Huey because of it's simplicity, and tried using it in a
commercial project, but it couldn't execute periodic tasks in less than a
minute intervals which puzzled me a bit. I had need to execute task every 10
seconds to reindex database changes to ElasticSearch.

After opening GitHub issue about it and asking if there were plans to
implement sub-minute intervals, library's author (coleifer on github) had
dismissive/arrogant attitude, and his reply was something along the lines of
"too bad you cannot fork it and do it by yourself" and deleted my thread.

This threw me off from using this library, and I went back to Celery.

~~~
coleifer
Couple points:

* how did you settle on 10 seconds? Keeping two databases in sync is a complex process. I'd suggest that the difference between running 1x/min or 6x/min is negligible -- and if it's _not_ then probably you need something more sophisticated than a simple cronjob.

* I am providing free software. You are not contracting with me to provide developer support, so in my book I'm under no obligation to be courteous and polite all the time. I try to be most of the time, but nobody is perfect. Luckily the source is available if you don't want to talk to me. That was my point and I'm surprised that is so triggering to some people.

* Closing an issue is not deleting your thread.

~~~
rsrx
Thanks for giving me advice on database syncing, but I think that you kinda
missed the point.

It doesn't matter how I chose 10 seconds as a periodic task interval, problem
was that to a simple question about feature support I got
dismissive/borderline rude answer which made me lose confidence in Huey as a
project and it's long-term maintainability. And it's not like I asked for
something exotic, but a support for subminute periodic tasks which almost
every other task queue has.

That issue doesn't seem to be anywhere in Github, so it's deleted:
[https://github.com/coleifer/huey/issues?utf8=%E2%9C%93&q=is%...](https://github.com/coleifer/huey/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aclosed+author%3Apaunovic)

As you said, it's a free software and you can do whatever you want with it,
but to me this is a huge red flag and I'm happy to not be part of it. ;)

~~~
coleifer
Its right here bro, in all it's undeleted glory:

[https://github.com/coleifer/huey/issues/118](https://github.com/coleifer/huey/issues/118)

------
wakatime
We tried this at WakaTime and it didn't scale well. We also tried
[https://github.com/closeio/tasktiger](https://github.com/closeio/tasktiger)
but the only ones that work for us are based on RabbitMQ:

Celery<4.0

[https://github.com/Bogdanp/dramatiq](https://github.com/Bogdanp/dramatiq)

~~~
sdan
Can you please elaborate? I just adopted RQ a few weeks ago and is working
fine for me (a few days before launch) with a couple thousand queries (1-2
jobs per 5 seconds) but a ton of connections (both making and taking jobs).
Any advice?

I'm thinking that after we go to prod, I'll move to celery and MQ overtime.

~~~
wakatime
This was multiple years ago so I don't remember details. I think something
around task reliability, performance, and I think architecture flaws that
meant we couldn't prioritize tasks or some other missing features we needed.
We process around 50-1000 jobs per second.

------
sleavey
I've not much to say other than that this is a great little library! I used RQ
for a small project recently and found it to be pretty easy to use. As my
project grew bigger I also found it contained extra features I didn't realise
I'd need when I started.

~~~
remlov
RQ scales surprisingly well and for certain types of projects is a nice
lightweight alternative compared to more complicated job queues such as
Celery.

~~~
sleavey
Yeah when I was looking at queues I was put off by Celery since somewhere in
its docs or a tutorial high up in the Google results it was suggested to use
both redis and RabbitMQ as some sort of brokers/results stores rather than
just one of those to handle both. I'm sure there were good reasons for that,
but I was looking for something simple and not to learn multiple new
technologies so in terms of simplicity RQ (which uses redis for everything) is
pretty hard to beat.

~~~
disiplus
for a simple queue the broker is the only important part. i like celery and
have projects in multiple languages using rabbitmq.

------
mperham
Most job systems like Celery or Sidekiq are language-specific. If you are
looking for background jobs for any language, check out Faktory.

More advanced features like queue throttling and complex job workflows are
available.

[https://github.com/contribsys/faktory/wiki](https://github.com/contribsys/faktory/wiki)

------
wryun
[https://dramatiq.io/motivation.html](https://dramatiq.io/motivation.html)

------
wojcikstefan
Shameless plug: Our team at Close has been inspired by RQ when we created
TaskTiger –
[https://github.com/closeio/tasktiger](https://github.com/closeio/tasktiger).
We’ve been running it in production for a few years now, processing ~4M tasks
a day and it’s been a wonderful tool to work with. Curious to hear what y’all
think!

~~~
wakatime
We tried Tasktiger but ran into zombie tasks and some small but annoying bugs
while testing. Around that same time, we found Dramatiq and currently have
Dramatiq + Celery<4.0 in use.

~~~
wojcikstefan
Interesting, we haven't had issues with zombie tasks at all (and we DID have
issues with them when using Celery). Did you manage to find out what was
causing them?

> and some small but annoying bugs while testing

Do you recall any details?

Appreciate you trying TaskTiger, even if you've moved on since!

~~~
wakatime
Here's the list of my issues:
[https://github.com/closeio/tasktiger/issues?utf8=%E2%9C%93&q...](https://github.com/closeio/tasktiger/issues?utf8=%E2%9C%93&q=is%3Aissue+author%3Aalanhamlett)

It came down to I like the features of RabbitMQ:

* RabbitMQ scales messages without needing tons of RAM * I don't have to decide between not persisting messages to disk with Redis vs only using half the machine's RAM [1] * Queue and task visibility is better in RabbitMQ * Support for purging all tasks in a queue * Tasktiger had lower throughput than Celery and Dramatiq, maybe needs lazy-forking?

Things I did like about Tasktiger: * no feature bloat and it's possible to
actually read it's source code[2].

[1]: To enable persisting to disk (Redis fork + save snapshotting) you must
limit Redis to only using half the available RAM on a machine.

[2]: Celery is split up into multiple convoluted, bloated, difficult to read
repos.

------
iddan
RQ is horrible. We’ve used in K Health and migrated to Celery as
configuration, optimisation and monitoring where all really hard with RQ. It
takes ten seconds to get started but days to get to production. Not a good
trade off!

~~~
jeffdico
Not sure RQ was the problem here. I have used on dev and productions and I
didn't have any issues with it at all.

------
mattbillenstein
I've used rq in prod at a couple places - nice little library.

I really like the design of beanstalkd and I used that at one place, but using
rq + redis was one less thing to deploy and/or fail.

------
nijave
Having used resque how production workloads I wouldn't want to use Redis as a
job queue. It works fine for small workloads but doesn't have many resiliency
capabilities and doesn't scale well (cluster mode drops support for some set
operations so probably can't use that). Replication is async and it's largely
single threaded so then you get this SPOF bottleneck in the middle of a
distributed work system

~~~
antirez
You may want to check the Disque module I just (a few weeks ago) published.
Synchronous replication and strong guarantees of delivery under failures. Also
best effort algorithms to minimize re-delivery of at-least-once messages.

------
adamcharnock
Different to RQ – but related - I recently released Lightbus [1] which is
aimed at providing simple inter-process messaging for Python. Like RQ it is
also Redis backed (streams), but differs in that is a communication bus rather
than a queue for background jobs.

[1]: [https://lightbus.org](https://lightbus.org)

------
akx
We were unsatisfied with Rq (I forget which parts, but I seem to recall there
was lots of code that strictly wasn't needed) and wrote
[http://github.com/valohai/minique](http://github.com/valohai/minique)
instead.

For less simple use cases, just use Celery.

------
sdan
I’ve used RQ in production applications just last week. It’s pretty basic so
there’s upsides and downsides, but so far it simply works! I May opt in for
celery down the line but for the state of my project, rq helped me ship quick
and iterate.

------
alexnewman
We use
[https://github.com/josiahcarlson/rpqueue](https://github.com/josiahcarlson/rpqueue)
and I absolutely love it

------
orf
Just use Celery. I know several teams who made the, in hindsight, very poor
choice of using RQ. By the time you realize it’s a bad decision it’s very hard
to get out of.

Unless your use case is absurdly simple, and will always + forever be absurdly
simple, then celery will fit better. Else you find yourself adding more and
more things on top of RQ until you have a shoddy version of Celery. You can
also pick and choose any broker with Celery, which is fantastic for when you
realize you need RabbitMQ,

~~~
nikisweeting
Our experience has been different, moving from Celery to Dramatiq saved us a
bunch of headaches and stability has been markedly improved.

------
tschellenbach
Celery + RabbitMQ is a great option for Python projects

------
matisoffn
Like Sidekiq but for Python. Neat.

------
ankut04
For python, celery seems good.

