
Code rant: The Database As Queue Anti-Pattern - FrancescoRizzi
http://mikehadlow.blogspot.se/2012/04/database-as-queue-anti-pattern.html
======
btilly
The database is not the world's most efficient way to do this. However if it
isn't a bottleneck, who cares? It is one less dependency to maintain,
understand, etc.

We're looking to create working solutions, not works of art. Seeking the
perfect component for every piece is often simply not worthwhile.

~~~
bunderbunder
Amen. Sacrificing pragmatism on the altar of perfectionism is also an anti-
pattern.

------
redcircle
I find it ironic that he writes this as a rant, which is an anti-pattern for
influencing people. One of his commenters hints at how it might have been
better presented: introduce messaging systems, explain how they solve the
problem, and describe their advantages over the database. For further
discussion on why a rant is probably an anti-pattern, see Dale Carnegie's "How
to Win Friends and Influence People."

------
tarr11
Using databases as a queue is not an anti-pattern (delayed_job is an example).

A full blown message queue system is often overkill; it adds another system
that you have to learn. (And frankly, queues are often backed by databases
anyway)

That said, it's best to avoid having application state in the queue, it should
be generic. This makes it more scalable and flexible.

~~~
Retric
Also, if your using the DB as a queue index on the status element. Inserts are
still low overhead and the DB can handle poling that table 10,000 time a
second without issue.

Edit: Just don't pole the 'finished' status as that can have a lot of elements
in it.

~~~
koenigdavidmj
A lot of databases can also provide partial indices, which are only maintained
and searched when a WHERE clause matches (example: in this case, _WHERE status
!= 'filtered'_ would be a good choice).

------
gaius
But no-one polls a database anymore; they all have a callback mechanism an app
can just subscribe to.

------
dclaysmith
I use MySql+cron as a simple message queue every now and again--it's great for
jobs like sending confirmation emails or registering new users for a mailing
list. ie. Tasks that make synchronous calls over the network and can degrade
the speed of important processes (registration, etc). You just have to make
sure that these tasks are going to be infrequent and not prone to spikes that
might overwhelm the server.

------
jgrahamc
A lot of that post is specific to SQL Server. I used a MySQL table as a queue
as part of Signal Spam (<https://www.signal-spam.fr/>) and it worked
flawlessly. Polling was at 1 minute intervals and queue entries that were
completed were simply deleted.

~~~
jpitz
That particular use-case is such an anti-pattern in SQL Server ( because SQL
Server isn't or hasn't until recently been MVCC ) that Microsoft added Service
Broker to the product.

It isn't nearly the anti-pattern, at least for many loads, in an MVCC database
engine.

~~~
endersshadow
SQL Server has been MVCC for seven years[1]...

And anyway, I disagree with that it's an anti-pattern. If you want to capture
information about each step of a process, SQL Server (or any RDBMS) is great
at that. However, if what you want is a workflow that your RDBMS is
facilitating, and you really don't care about the intermediate steps, you just
want to make sure it gets done, then there are a lot of workflow systems out
there that do a great job. I'm also a big fan of combining workflow software
with logging the steps to SQL Server for a straight-forward process.

But again, it depends on what data's being stored and what types of queuing
you're doing. One project I've done that comes to mind is to take multiple
sources, grab the changes, then store the combined, conformed data in an
operational data store, while flagging the rows that were changed. Downstream
systems then pick up the changed rows based on their last run and pull those
changes down--completely oblivious to the fact that multiple sources are
coming together. In this case, you could abstract it away to the fact that
it's a queuing system, but there are multiple inputs and multiple endpoints,
so a persistent store with that data is paramount.

[1]:
[http://en.wikipedia.org/wiki/Multiversion_concurrency_contro...](http://en.wikipedia.org/wiki/Multiversion_concurrency_control#Databases_with_MVCC)

~~~
jpitz
>SQL Server has been MVCC for seven years

You're technically correct, which is the best kind of correct to be. Still,
there are nuances: snapshot isolation is not the default, and not many people
are aware of it, much less use it.

So far as the pattern/antipattern - I agree wholeheartedly with you. I would
put it in the category of flagged patterns - the usage characteristics and
especially the usage growth need to be carefully understood when doing this.

------
pbreit
I would be happy to use something else but every time I start to look at
ZeroMQ or celery, I lose interest and just go back to the DB which
surprisingly, seems a lot easier to use. Is there a delayed_job for Python?

------
tszming
Let's face it, not all systems in the world need to be "web scaled", as the
author also said: "Simple, use the right tool for the job".

But you need to understand not all jobs are created equal..

