
Redis reliable queues with Lua scripting - wglb
http://antirez.com/post/250?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+antirez+%28antirez+weblog%29
======
pjscott
Pretty slick. A few questions:

* Can you use Lua scripts with blocking operations like BRPOP? Or is the strategy here to just poll if the queue is empty? (Either is a usable strategy.)

* Sometimes you will have particular queue items that cause the processor to die, because of some weird bug. It can be helpful to stick those into a separate "holding area" after a few failures, and send out warning emails to someone who can fix the problem.

* How do you monitor the number of jobs queued and in progress? You could get all the items in the list with LRANGE and look at their timestamps (if any), but if the items are large and there are a few million of them, this could be irritatingly slow. Monitoring is super-important!

That said, Redis can be really useful for writing queues. At Greplin we're
pushing so much data through a Redis-based queue system that we actually
managed to max out the operations-per-second on EC2 Large instances, and had
to shard across a few servers. The basic idea is that we encode everything as
protocol buffers, LPUSH it into a waiting queue, and have the processor do
something similar to this:

1\. RPOPLPUSH moves the item at the head of the waiting queue into a one-
element in-progress queue. The name of the queue includes a counter for how
many unsuccessful processing attempts have been made (starting at 0).

2\. Process the item.

3\. LPUSH the output into zero or more waiting queues for the next processing
stage, and delete the in-progress queue. Unless there was an error, in which
case things get complicated. :-(

There are a surprising number of complicated little issues that come up when
making queueing systems for heavy production use! For example, how do we
coordinate the topology, and which servers to connect to? If one of our EC2
instances goes down and takes a Redis master with it, how do we seamlessly
switch over to the slave? How do we shunt aside "killer" items for later
processing after we've fixed the bugs that they were causing? What happens
when there's a load spike and queue length goes through the roof, and the
Redis servers start using up all their memory?

These problems all have solutions -- for example, you can build most of the
coordination and rate-limiting stuff with Zookeeper -- and for the most part
I've been very happy with Redis. It's been very solid and reliable, with
convenient operations, easy administration, and great speed! I just think that
queueing systems are an area ripe for improvement.

------
wslh
Queues are now a "common place". But we need to ask a lot of questions before
using "reliable queues":

i) Has it acknowledgment?

ii) Has it support for transactions?

iii) Has it persistence?

iv) Contention management?

And this is beyond talking about AMQP, interoperability and many other
questions.

------
freeman478
Nice example of how lua scripting will improve the usability of Redis !

A nitpick, the described queue is First In First Out (which is the natural
ordering for a queue).

