
Don't Use an RDBMS for Messaging - DrJokepu
https://functionwhatwhat.com/why-you-should-not-use-an-rdbms-for-messaging/
======
chanks
The major benefit of putting your queue in your RDBMS, which isn't commonly
brought up in these discussions, is that it lets you protect your jobs with
the same ACID guarantees as the rest of your data. This is very valuable for
some use cases.

I have a Postgres-based job queue that uses advisory locks to get around some
of the drawbacks he mentions (job lock queries don't incur writes or block one
another like SELECT FOR UPDATE would). Feedback is welcome:
[https://github.com/chanks/que](https://github.com/chanks/que)

~~~
dhaivatpandya
You can provide the ACID guarantees by using something like Redis.

~~~
chanks
It can only be consistent with the rest of your data if the rest of your data
is also in Redis.

------
freework
My rule of thumb is: First build it with a database, then break things off
using NoSQL/Queues/Memfs/etc. after scale increases to a point where it
becomes a pain. Its never a good idea to start something built in the most
optimized way possible (in terms of performance). In my experience 99% of the
time, actual performance numbers are way less than what you expected anyways.

~~~
craigching
I don't consider using a messaging system if you need messaging premature
optimization at all. That's just using the right tool for the job.

~~~
vidarh
"Messaging" and "messaging system" is too nebulous.

Why are you not using a mail server?

I'm totally serious. Been there, done that (used qmail as a message bus). It
works great for certain types of requirements.

Sometimes you badly need persistence. Sometimes it doesn't matter. Sometimes
you need massive scalability. Sometimes you need reporting. Sometimes you need
to be able to provide a dashboard to visualize pending messages. Sometimes you
need to be able to publish to large numbers of listeners at once. Sometimes a
small number of listeners needs to aggregate events from thousands or millions
of sources. Sometimes you need a message to only be processed exactly once.

Often you can nail down some of those issues from the outset. As well as
scale. But often you won't know in advance.

More importantly: Your system will almost certainly use a database server.
Unless you have clear indications that your needs can't be met by the database
server, adding a "messaging system" is one more dependency which you don't yet
know whether or not you'll actually need. One more thing that can fail.

~~~
craigching
> Why are you not using a mail server?

If that's the right tool for the job, then I wouldn't argue with you. But my
messaging needs are typically solved by tools like WebSphere MQ, RabbitMQ,
etc.

------
yangyang
There is an interesting post about locking and performance when implementing
queues in PostgreSQL by one of the PostgreSQL contributors here:
[http://johtopg.blogspot.se/2010/12/queues-in-
sql.html](http://johtopg.blogspot.se/2010/12/queues-in-sql.html)

~~~
baudehlo
This is a great blog post. Anyone implementing this sort of thing in Pg needs
to read it.

------
ddorian43
[http://www.reddit.com/r/programming/comments/2hkget/using_yo...](http://www.reddit.com/r/programming/comments/2hkget/using_your_rdbms_for_messaging_is_totally_ok/)

------
jpalomaki
Two reason why I have ended up using RDBMS for messaging: 1)Transactions can
easily cover both DB and messaging operations (distributed transactions are
not easy/possible on all platforms) 2) The database is anyways there, no need
to add another critical component to the stack.

~~~
baudehlo
I think a better reason is that jobs and messages can be relational with the
other tables in the system, and so can provide the same guarantees that the
rest of the db offers.

------
saurabhnanda
This is a topic that I'm actively researching on for my startup
(www.vacationlabs.com) and would love to know what other experienced people
think.

Our Rails app has grown over time and we ended up doing everything in a
monolithic way to get stuff out faster. However, now the app is so big and
tightly coupled that newer members of the team find it very hard to
contribute. Therefore we are thinking of breaking it down into smaller apps
which talk to each other using some sort of APIs -- either JSON/HTTP or MQ.

The MQ use case being considered is this -- the core transactional part of the
system handles just that - maintaining tour availability and keeping track of
payments. Everything else will be hived off to a mini-app which is notified of
booking/payment related changes via a pub/sub model. So the email notification
system will subscribe to the new-booking, booking-cancelled, payment-received,
etc events and will send out emails appropriately. The billing system will
subscribe to another set of events and will bill the customer appropriately.
And so on.

Is this the right use case for an MQ? Is using an MQ worth the additional dev-
ops related complexity that it brings along? in our case message delivery
needs to be guaranteed, else extremely important business functions will not
work. How do we deal the MQ unavailability? In normal cases if the DB is
unavailable your system is down. Is this how MQ should also be treated? If
not, how do you deal with the situation where the core DB transaction is
complete, but for some reason you're unable to publish an event to the MQ? If
the pub/sub system were built on top of the DB itself this problem would not
arise because publishing an event would be part of your DB transaction itself.

Is there a sane way to build a pub/sub architecture on top of a DB, especially
Postgres, which is what we're using. If not, any recommendations for which MQ
we should be using for a guaranteed delivery pub/sub model?

~~~
lobster_johnson
While it sounds like it could be a case for a queue, the fact that your
workflow is mission-critical is itself a reason not to use one. In particular,
a reason not to use RabbitMQ.

Some message queues are more reliable than others. RabbitMQ is designed to be
clustered, and its handling of partition tolerance has been shown [1] to be
very bad, something that I have experienced personally in a production system.
Don't ever use it if losing messages will be a problem; and never use it with
automatic acking (you'll want messages to be retried if your workers die mid-
stream). RabbitMQ _can_ be reliable if your boxes are all on a native (not
cloud VM-based) LAN that is stable, and your machines don't occasionally get
so overloaded that it impacts network connectivity.

One possibility is to use a message queue purely for signaling, not for state
-- instead, use databases and APIs to transmit actual state. For example,
let's say you want to shoot off an email every time there is a new booking.
The "emailer" app listens to events published by the "booking" app. But the
event doesn't contain any information about the booking; instead, the event
simply says that there was a booking. When the "emailer" app receives this
event, it asks the "booking" app for new bookings that it doesn't know about;
it processes each booking, first recording that a (booking_id, email_id) row,
then firing off the email, then committing that row.

This makes every participant in the workflow idempotent, because they can run
the same piece of logic many times and still produce the same result. If you
ever have a problem with the queue going down, you can simply execute the
exact same code: You don't need to replay any missing events. You only need to
worry about multiple listeners (multiple "emailer" workers) waking up from the
same notification and doing the emailing for the same bookings. This is why
you must transactionally update your email log table using database locks. You
don't necessarily need to use database locks, but such a system needs _some_
kind of locking to be absolutely atomic.

The nice thing about this workflow is that you can make it extra bulletproof
by making the "check for bookings to email about" logic run, say, every ten
minutes -- _in addition_ to responding to the message queue events. If the
queue isn't working, your app will still be doing the emailing, just a little
slower. In other words, the queue simply becomes a trigger mechanism.

[1] [http://aphyr.com/posts/315-call-me-maybe-
rabbitmq](http://aphyr.com/posts/315-call-me-maybe-rabbitmq)

------
lobster_johnson
While the article isn't wrong, it's increasingly my belief that "messaging" is
something of a misnomer, because it's used to refer to two rather one
different concepts: Events and jobs.

We've been using RabbitMQ for messaging for a long time. Some messages are
purely reactive: An object was changed, so some other subscribing application
needs to react by updating its data structures to reflect the change. This is
an event.

Other messages represent jobs. Jobs are tasks that are queued up for
processing. Processing photos, for example, is a classic job.

Jobs don't fall that well into the RabbitMQ/AMQP framework because it's
designed for messaging and is, despite support for things like durability and
dead-letter exchanges, largely about ephemeral data. RabbitMQ is pretty good
with events; its excellent routing support, for example, makes it trivial to
build simple a pub/sub architecture, and it's pretty fast.

But one big difference between events and jobs is that events are "fan out"
(every logical type of listener gets routed one copy of each event, because
they may be subscribing for different reasons), whereas jobs are directed
(there is only processor that can do the job, and there is always just one
processor working on a single job at any given time).

Another difference is that a job can be in various states. With events, you
don't care about completion: A consumed event has been consumed. But with jobs
it's terribly useful to track the final status of everything. Jobs should be
pausable and long-running jobs should be able to report on their progress
("34% of subjects mutated") along the way. Jobs should arguably be capable of
being re-run, even if they succeeded. The performance of jobs -- how long they
took, how long they had to wait, their failure rate, the rate of their various
progress metrics (516 photos processed), etc. -- is something you want to
track and aggregate.

But as it happens, relational databases excel at the sort of stuff you need
for jobs. I haven't made any decisions yet, but I'm really itching to create a
job management app on top of Postgres, unless I can find an existing system
that is equally stable. I've played with the idea of a hybrid system: Register
jobs in Postgres, use RabbitMQ as the trigger mechanism. But this puts a
burden on the processor, which now has to use two APIs.

(On a related note, a big pet peeve of mine is how flaky RabbitMQ is. Being
used to working with Postgres, which is famously stable, RabbitMQ is easily
the least reliable piece of software in our application stack. We lose
messages every week due to network instability and RabbitMQ bugs. There's
something to be said for Postgres' master/slave design here. It may not be a
cool clustered Erlang project, but it just _works_.)

~~~
jeremyjh
There is not very much relational about jobs, you just need a good persistence
mechanism. Have you seen Gearman ([http://gearman.org](http://gearman.org)) ?
It is designed explicitly as a job manager.

~~~
vidarh
There's lots of stuff that is relational about jobs the moment you have a
large number of them, and need to be able to search and filter by different
statuses, different users, different sources, date and time, type of job etc.,
or run reports over them (how many percent of jobs are in what states? average
latency to start processing? average latency to completion?)

You can work around that by putting metadata about the jobs into an RDBMS, or
collating it separately, so you can certainly make do without an RDBMS...

Or you could just put them in the RDBMS in the first place and place a few
indexes and optionally triggers to create log entries on state changes.

~~~
jeremyjh
These all sound like attributes of a job definition - not separate entities to
which a job is related. You are correct they would need to be indexed.

------
DenisM
One could replace locking with optimistic concurrency. Off the top of my head
(MS SQL):

    
    
      -- SETUP
    
      drop table dbo.myqueue
    
      create table dbo.myqueue (
        itemId int identity primary key clustered,
        worker varchar(32) null,
        myguid uniqueidentifier null,
        work_timeout datetime null,
        picture_to_resize nvarchar(128) not null, 
      )
    
      -- ENQUEUE WORK
    
      insert dbo.myqueue(picture_to_resize) values('Kitten.jpg')
      insert dbo.myqueue(picture_to_resize) values('Doggy.jpg')
    
      -- PROCESS WORK FROM QUEUE
    
      declare @worker varchar(32)  = 'worker-21'
      declare @guid uniqueidentifier = newid()
      update t set 
        t.worker = @worker, 
        t.work_timeout = DATEADD(minute, 30, getdate()),
        t.myguid = @guid
      from 
        (select top 1 * from dbo.myqueue 
          where work_timeout is null or work_timeout<getdate()
          order by itemId asc
        ) t
    
      select * from dbo.myqueue where myguid=@guid
      
      -- If nothing found, client should retry...
    
      -- UPDATE TIMEOUT AS PROGRESS IS MADE
      -- Don't update if another worker stole our job!
    
      update t set t.work_timeout =DATEADD(minute, 30, getdate())
        from dbo.myqueue t where myguid=@guid and t.worker=@worker
    
    
      -- WHEN WORK IS DONE, ITEM SHOULD BE DELETED/ARCHIVED...

~~~
pmboyd
I worked at a mass notification company that used a similar approach and it
worked well with a handful of workers pulling in batches that could push out
300 to 500 of messages per second and could queue a quarter million messages
in a few seconds. There wasn't really a need to optimized past that since the
throughput on the rest of the system was the bottlenecked by outside
components.

The problem with queuing in {insert new hot queueing system} is that
persistent and failover support is often weak. They are by default are more
efficient and simpler than RDBMS. If you're running Snapchat, Redis or
RabbitMQ or whatever is entirely reasonable.

------
craigching
Coming from the WebSphere MQ world, there is a saying that has been around
forever. The saying is "Don't use a messaging system as a database." :p
Sometimes people have to be reminded of the obvious I guess!

------
riking
With regards to the index - did they consider a partial index on (queue,
date_sent, consumed = false) ? So that already-consumed messages aren't in the
index.

------
tgflynn
My recollection is that Oracle supports notifications, which is essentially an
event system.

I'm surprised that Postgres has not yet implemented something like this. Would
it not make sense to have a notification mechanism integrated into the RDBMS
to avoid the necessity of polling ?

~~~
vidarh
Postgres _does_ have notifications. And the article mentions this:

> Notifications are not part of the SQL standard and most relational databases
> do not support them. There are some exceptions, however. For example,
> PostgreSQL has very decent support for notifications.

~~~
tgflynn
OK, thanks for that information.

I scanned the documentation TOC and found no mention of them. I guess I would
have thought such a fundamental feature would show up in the TOC but it turns
out you have to dig into the SQL documentation to find the NOTIFY and LISTEN
statements.

If I had used the search feature on the word "notification" I would have found
this but it's not necessarily obvious that they would actually be called that
(ie. they could be called events or something else).

------
epeus
The way to do this is to have a field in the message record that says which
consumer it is assigned to, so the consumers set that, process and then set
the consumed flag. Then the db can be coherent, and show which consumer
processed what.

------
annasaru
The restriction on delivery also implies that a message will not be delivered
twice.

------
lafar6502
I'd say: use RDBMS for messaging wherever you can. Messages are data, please
keep it where the data belongs. If you have all your data together it's much
easier to do backups/restores or any other maintenance tasks. Also, you want
to be transactional, and you will if sending/receiving messages is just a part
of a normal database transaction (You don't even need distributed
transactions). RDBMSes are inherently transactional and they've done this
right, why replace it with something half-baked and unreliable? And your
admins will be happy too, they don't want to support any exotic messaging
technology. Just make sure you know what you're doing, RDBMS can be quite
powerful message queue if you do it right.

~~~
beat
Have you read _Patterns of Enterprise Integration Architecture_ , by Martin
Fowler? I think it might change your mind on this topic. Messaging isn't the
same thing as static data.

~~~
Avalaxy
I think nobody ever read that book. It's more used as a reference. I tried
reading it, but it was the most boring book I ever encountered (even worse
than structure and interpretation of computer programs, which is also
incredibly boring).

~~~
vidarh
I read it. It's one of my favourite technical books.

------
korzun
Don't get this post. He is comparing RabbitMQ to PostgreSQL and states that
RabbitMQ is better for messaging out of the box because you don't have to
build anything else.

Duh?

------
alttab
I use MongoDB for everything. Its web scale.

