

Three fundamental tricks for developers writing distributed systems - pedrobelo
http://pedro.herokuapp.com/past/2012/12/3/three_fundamental_tricks_for_developers_writing_distributed_systems

======
ameyamk
Many times the idea behind distributed systems is to avoid single point of
failure. by using Database for communication, you are essentially creating
another single point of failure in the form of database (unless database is
running on some highly reliable elaborate master-master setup). However you
can use systems like zookeeper to get similar functionality and to facilitate
communication.

------
i0exception
Isn't this the same as a message queue? Why would you want to rewrite this
using a database? Also point/trick 2 seems unnecessary if you are using 3
(idempotent jobs). By queuing job ids you now have a consistency dependency
between your message queue and database.

~~~
pedrobelo
When using a non database-based queue you'll have to find another mechanism to
make sure your operation is still atomic.

In other words you can end up in a situation where a record is inserted but
the job to work on it is not enqueued, or worse - that an insert fails but the
job to work on it is enqueued.

Point 2 is still necessary despite idempotency imo: lets say some value is
updated to "a" and then to "b", enqueuing two jobs. If the request to update
"b" runs before "a" then your receiver will end up with the wrong value. Same
if the initial request to update "a" fails.

~~~
amalter
It seems to me that XA transactions one of those patterns that need to be
rediscovered every generation. I see folks start with 0mq or Redis and hit
edge cases where messages get lost.

I'd love to see a somewhat simple distributed transaction standard for http
api's emerge.

There has to be a middle ground between the fiddly bits of soap's ws-reliable
or full on JMS broker and ad-hoc Redis queues.

------
fusiongyro
The one thing that concerns me about using the database as a queue is that
MVCC doesn't really lend itself to writing threading primitives like locks.
I'm curious how one would go about writing a queue in an MVCC architecture--
off the top of my head, I guess you could have a job assignments table to link
processors and jobs, make the job FK unique and interpret forced rollbacks as
indicating that another thread grabbed the job before you did. Then again, if
your queue is only running idempotent functions it wouldn't matter if you had
more than one thread doing the same work, it would just be a waste of time.

~~~
amalter
This was one of the main reasons explained to me on why db based queues are a
code smell.

The other was that insert/select pattern of a queue hammers a db's index and
data page fragmentation algorithms. Anyone know if this is still the case or
am I repeating a wives-tale?

~~~
fusiongyro
There are so many factors that go into database performance, but I would
expect high frequency queuing would be problematic even today for the reasons
you mention. If the load is low it's probably fine.

------
NathanKP
For a while I was using my own custom written job distribution system built on
the database but then I discovered Gearman and since I implemented it I have
seen increased reliability and productivity with creating new jobs.

I do not recommend rolling your own system. Use something that is already
built as a server to accomplish the task. Amazon SQS is also a good solution.

------
tbrownaw
Just what sort of "distributed" is this talking about, that there's a central
DB to put the queue in?

~~~
pedrobelo
Not advertising that a queue (or any database for what matters) is shared
between apps!

------
mtaubman
What pun?

~~~
fusiongyro
I think he means "enjoy the benefits of acid" as in LSD. Not much of a pun.

