This isn't terribly surprising to me, since I have an appreciation for what long-running transactions will do to a system, and I try to design systems to use transactions that are as short-lived as possible on OLTP systems. I realize that this should be explicitly mentioned in the docs, though, I'll fix that.
I'll also note that since the beginning Que has gone out of its way to use session-level locks, not transaction-level ones, to ensure that you can execute long-running jobs without the need to hold open a transaction while they work. So I don't see this so much as a flaw inherent in the library as something that people should keep in mind when they use it.
(It's also something that I expect will be much less of an issue in version 1.0, which is set up to use LISTEN/NOTIFY rather than a polling query to distribute most jobs. That said, 1.0 has been a relatively low priority for much of the last year, due to a lack of free time on my part and since I've never had any complaints with the locking performance before. I hope I'll be able to get it out in the next few months.)
+1! I tried to clarify in the "Lessons Learnt" section that this isn't so much a problem with Que, but something that should be kept in mind for any kind of "hot" Postgres table (where "hot" means lots of deletions and lots of index lookups). (Although many queues are more vulnerable due to the nature of their locking mechanisms.)
But anyway, thanks for all the hard work on Que. The performance boost upon moving over from QC was nice, but I'd say that the major win was that I could eliminate 90% of the code where I was reaching into QC internal APIs to add metrics, logging, and other missing features.
> What we have implemented was a Python decorator to have the tasks actually fired after the function completion (thus after the transaction commit). In Go we achieve the same result in a more simple way using the defer statement. In my experience, all these solutions are local to the process that fires the task, so there is less risk of interaction with other process, easy to implement and more robust compared to other solution.
Oh yes, totally. I've seen this same pattern in Ruby before whereby a job enqueue is put on something like an ActiveRecord `after_commit` hook.
One (overly) pedantic observation is that this still leaves you with the possibility of having your transaction and data safely committed in your database, but with your job not enqueued if something happens to your process between the time of commit and time of enqueue. Admittedly though, this probably doesn't happen all that often in real life.
Probably the best answer I have is that we take this approach for the sheer convenience. We can do things like this:
def create_user(email, password)
user = User.new email: email
# make an account ID queue job to create record in billing system
user.billing_account_id = uuid()
# queue job to send an e-mail invite
If you're using an external queue such as RabbitMQ, there's basically no way to guarantee a queue message being fired without a "two-phase" transactional approach.
Let's say your use case is creating a user and sending an email verification email. You need to:
1. Start transaction.
2. Create the user.
3. Insert a tuple into an "events to fire" table.
5. Fire the event.
6. Delete the tuple.
This ensures that if the transaction fails, nothing is fired; if the transaction succeeds, we guarantee that the user now exists, and that the command to fire the event also exists. Thus, if step 5 fails, we still have the "events to fire" tuple, and we can have a cron job running every minute that picks up stragglers. (In practice, you'll need to run steps 5-6 in a transaction that exclusively locks the tuple so that a concurrent cron job doesn't do it.)
This way, you're risking duplicate events, but you'll never lose the event, unless RabbitMQ falls over. Duplicate events are of course unfortunate and undesirable, but missing events are worse.
Of course, with the above solution, it could be argued that your "events to fire" table is a queue itself. The good news is that RabbitMQ is much better at scaling a queue than Postgres, so the duplication of work isn't actually that much of a problem. The "events to fire" table can be optimized in various ways; Postgres supports "unlogged" tables, for example.
Another approach which will be available soon is Bottled Water , which gives you pretty much the above, except without needing to maintain the "events to fire" table. But it's definitely more complicated.
I am currently implementing a project in which we use Postgres to track job state (eg., run status, failures, timings, resource usage, related log entries), but Kafka as the actual queueing mechanism -- thus bypassing the challenges mentioned in the article but still getting the best of Postgres.
This way we have complete, introspectable, measurable history about every queue item. It greatly simplifies the Postgres part of it (state updates are always appends, no locking) and thanks to Kafka, increases performance and scalability.
It also adds a measure of safety: We can detect "lost" jobs that disappear because of data failure, bugs in Kafka, failing clients etc. We know that if a job was never logged as "complete", it probably died.
The job log also functions as an audit log, and we also intend to use it for certain jobs that benefit from being incremental and from able to continue from when it last left (for example, feed processing).
I'd be curious to hear about the mechanic that you came up with for division of labor among workers — since every client is essentially reading the same stream, I guess you'd have to distribute jobs based on job_id modulo worker_number or something like that?
Kafka divides a queue into partitions. Each partition is a completely independent silo. When you publish messages, Kafka distributes them across partititons. When you read messages, you always read from a partition.
This means partitions are also the unit of parallelism: You don't want multiple workers on a single partition (because of the labour division problem you mention). Rather, Kafka expects you to have one partition per worker.
This is more elegant than it sounds if you're coming from something like RabbitMQ. Partitions (ie., queues) in Kafka are append-only and strictly linear; unlike RabbitMQ, you can never "nack" a message in a way that results in the message ending up at the back of the queue and thus violating the original message order. Rather, Kafka expects each consumer to maintain its "read position" in the queue. Failure handling, then, is simply a matter of winding back the read position. And unlike RabbitMQ, there's less need for complicated routing, dead-letter exchanges and so on, because rather than move messages around, you're just moving a cursor.
Of course, message order is only preserved within a single partition; if you publish messages A, B and C and you have 3 partitions and 3 workers, then in a real world, messages may be processed in the order C, B, A. That sounds bad, but then other queue solutions such as Que or RabbitMQ suffer from the exact same problem: If you run 3 workers against one queue, your queue may supply each worker with messages in the right order, but there's no guarantee that they will be processed in that order. The only way to guarantee ordering is to have just one worker per queue, using some kind of locking (RabbitMQ does support "exclusive" consumers). But then you don't get any parallelism at all. So I think Kafka's solution is quite sane, even if it's more low-level and less developer-friendly than AMQP.
FWIW, the transactional properties of a Postgres-backed queue were so convenient that we took advantage of them for a long time (and still do) despite the fact that they have a few caveats (e.g. poor degraded performance as outlined in the post), but more recently there's been a bit of a shift towards Sidekiq (probably because it's generally very problem-free and has some pretty nice monitoring tools).
(Disclaimer: I authored this article.)
We're super cautious about long-running transactions anyway, as they cause a load of other issues (e.g. http://www.databasesoup.com/2013/11/alter-table-and-downtime... - full blog post coming soon!)
We use JQJobs (which we authored) to manage queueing and it's architected such that it could be ported to Redis or some other better backing store, or potentially even to QC/Que, which I wasn't aware of until your article (so thanks for that!).
> Fortunately our job throughput is small enough to not hit any scaling issues with Postgres, so I stuck with that because of my confidence and experience w/Postgres over the years.
I think we're in a pretty similar situation. For what it's worth, I think that a queue in PG can scale up about as well as Postgres can as long as you keep an eye on the whole system (watch out for long-lived transaction and the like).
RQ has been good to us so far. There's a simple dashboard for it that works well enough. After messing around trying to find my data in rabbitmq it was a real relief to be able to query a simple set of redis keys.
However, storing small, ephemeral messages like jobs in a queue within Postgres is a bad idea and the pain far outweighs the benefits of transactional rollback for jobs.
Instead, a much simpler solution is to plan for jobs to run at least once, use a more appropriate datastore like Redis or RabbitMQ, and build in idempotency and error handling at the job layer.
Postgres used as a system of record shouldn't be used for ephemeral message queues.
For example, suppose you mark an account closed, commit, and then enqueue an event to issue a refund from another system. It's possible that your process may crash or be killed at just the wrong time leaving you with a closed account but no refund.
So what if you enqueue the event before you commit? In that case, you might crash before committing which will automatically rollback. Now you've done a refund on a non-closed account.
Transactions make it trivial to guarantee that either both happen or neither do. There are other ways to get that guarantee, but they require more work and are more error prone.