

Dynamiq – A simple implementation of a queue on top of Riak 2.0 - mpc
https://github.com/Tapjoy/dynamiq

======
orthecreedence
> You should account for this in your design by either managing your own de-
> dupe solution (such as using Memcache to hold the keys you've seen from a
> given queue, expiring with the visibility timeout on the queue) or design a
> system which self-defends against duplicate messages.

At that point, why not just use beanstalkd or redis? You're already
centralizing part of your queue elsewhere, and the point of Riak is
decentralization.

However, if you _can_ tolerate duplicate messages, then this seems like a cool
system. I didn't get from the README how it actually orders the jobs/messages
though. I get it uses an index range scan, but on what values? Is it an ID,
and how does the client generate these?

~~~
StabbyCutyou
Hi, thanks for the feedback.

Before I respond, let me say that I'm one of the core developers of the
system, and that I am a big fan of Redis, and conceptually of Beanstalkd
(never used it, but I know someone who is a big fan of it).

The reason you wouldn't use Redis of Beanstalkd is because they are inherently
singular systems - Things exist to make them behave in a distributed fashion
(for Redis, anyways - not sure about Beanstalkd?), but they are not. You're
not going to scale them past a single box / node (although I will admit, I am
not up on my Beanstalkd news, so possibly they've made strides there?).

Dynamiq leverages the amazing work done by Basho to build a rock solid
distributed data store. By providing a light layer of coordination and logic
at the edge of the system, it allows you to treat Riak (in all of it's AP
glory) as something like a queue (this dovetails into your question about
order, below). Riak is distributed to the core, and Dynamiq uses that to it's
advantage. Need more capacity? Add more nodes. It'll handle the rest.

On the subject of dupes, they are technically "rare" once the system is
running at an even keel. Only when nodes enter / leave or when you alter the
configuration of a queue will you be likely to see dupes. Otherwise, the only
"dupes" you'd see would be when the timeout expires and "out for delivery"
messages become naturally available again (but we do not consider those
"dupes").

The system in no way shape or form guarantees, offers, or even implies order.
You will likely get a very loose order so long as you are always keeping up
with the rate of messages in, but thats it. Never assume order, and in general
you should strive to build systems that are resilient in the face of a lack of
order.

The range scan operates over an ID that Dynamiq itself assigns to the message,
by generating a random int64 using the golang secure/crypto library. The
client cannot specify the ID. However, and this is what we do internally, we
assign the message an internal, application specific ID prior to publishing it
to Dynamiq. So each message ultimately ends up with 2 IDs - 1 for Dynamiqs own
use, which you use to ACK once you're done, and one application specific ID
which may or may not mean anything to the consuming service.

~~~
orthecreedence
Cool, thanks for the answers, and great work. I understand there's a trade off
between distribution and strictness, seems like Dynamiq actively chooses
distribution, which is actually a very cool choice to see a queue move into
(most or all queues I know of tend to favor consistency).

To answer your questions on beanstalkd, the answer to scaling past a single
node is sharding, much like a traditional SQL database (although no
replication exists). There really is no HA option. Redis is similar, although
it does support replication, so you do have more failover options.

