

Adventures with Disque (in Go) - dvirsky
http://geeks.everything.me/2015/05/03/adventures-with-disque/

======
antirez
Hello, thanks for the post and interest! I added a FAQ in the Disque README
about something mentioned in the post. Cut & pasting here:

Q: When I consume and produce from different nodes, sometimes there is a delay
in order for the jobs to reach the consumer, why?

A: Disque routing is not static, the cluster automatically tries to provide
messages to nodes where consumers are attached. When there is an high enough
traffic (even one message per second is enough) nodes remember other nodes
that recently were sources for jobs in a given queue, so it is possible to
aggressively send messages asking for more jobs, every time there are
consumers waiting for more messages and the local queue is empty.

However when the traffic is very low, informations about recent sources of
messages are discarded, and nodes rely on a more generic mechanism (which is
used during high traffic as well to discover new sources) in order to discover
nodes that may have messages in the queues we need them.

For example imagine a setup with two nodes, A and B.

1\. A client attaches to node A and asks for jobs in the queue myqueue. Node A
has no jobs enqueued, so the client is blocked.

2\. After a few seconds another client produces messages into myqueue, but
sending them to node B.

During step 1 if there was no recent traffic of imported messages for this
queue, node A has no idea about who may have messages for the queue myqueue.
Every other node may have, or none may have. So it starts to broadcast
NEEDJOBS messages to the whole cluster. However we can't spam the cluster with
messages, so if no reply is received after the first broadcast, the next will
be sent with a larger delay, and so foth. The delay is exponential, with a
maximum value of 30 seconds (this parameters will be configurable in the
future, likely).

When there is some traffic instead, nodes send NEEDJOBS messages ASAP to other
nodes that were recent sources of messages. Even when no reply is received,
the next NEEDJOBS messages will be sent more aggressively to the subset of
nodes that had messages in the (very recent) past, with a delay that starts at
25 milliseconds and has a maximum value of two seconds.

In order to minimize the latency, NEEDJOBS messages are not throttled at all
when:

A client consumed the last message from a given queue. Source nodes are
informed immediately in order to receive messages before the node asks for
more. Blocked clients are served the last message available in the queue. For
more information, please refer to the file queue.c, especially the function
needJobsForQueue and its callers.

------
lloeki
> (As a sidenote, 9 parameters for a function call can be a bit annoying in a
> language like Go, that doesn’t support default arguments and method
> overloading)

This trick[0][1] turns out to be very useful.

[0] talk:
[https://www.youtube.com/watch?v=24lFtGHWxAQ&index=15&list=PL...](https://www.youtube.com/watch?v=24lFtGHWxAQ&index=15&list=PLMW8Xq7bXrG58Qk-9QSy2HRh2WVeIrs7e)

[1] slides (with transcript): [http://dave.cheney.net/2014/10/17/functional-
options-for-fri...](http://dave.cheney.net/2014/10/17/functional-options-for-
friendly-apis)

~~~
dvirsky
I used the configuration struct pattern in this case. When you want to call
ADDJOB you use an AddRequest struct, where zero values to the fields mean the
parameters are not used, and the Add method accepts this struct.
[https://github.com/EverythingMe/go-
disque/blob/master/disque...](https://github.com/EverythingMe/go-
disque/blob/master/disque/disque.go#L32)

I want to add some builder style API for it, but didn't get to it yet.

------
hoare
whats the advantage over e.g. nsq? im pretty comfortable with it at the
moment.

~~~
antirez
A few key differences I noticed reading NSQ documentation right now:

1) Disque synchronously (or async if you want) replicate messages across the
cluster, so N-1 nodes can fail (N is the replication factor) and the message
will still be delivered. NSQ is different in that regard, quoting from the
doc: "messages are delivered at least once. Closely related to above, this
assumes that the given nsqd node does not fail."

2) Federation apparently is not built-in but uses "nsqlookupd", however this
may be similar to what Disque provides, I'm not sure. In Disque you can have a
single queue distributed among multiple nodes, it looks like using
"nsqlookupd" this is the same in NSQ.

3) If I understand correctly, NSQ has no support for delayed messages.

4) I'm not sure if in NSQ you can specify a per job retry time for jobs to be
re-delivered automatically, and if a TTL time is also available.

5) I'm not sure if NSQ is able to re-queue the message even without receiving
a negative ACK from the client.

Would be cool to have all this checked by NSQ authors.

~~~
hoare
Point 1) is a strong one, didnt think of that. thx for clarification:)

------
bontoJR
I was wondering if this can actually handle a whole push notification server.
Do you think that it can be handled?

~~~
dvirsky
when it stabilizes - sure. I wrote a little crawler to try it out, it
performed great but seemed to hit some bug when the backlog was huge (around
1G of task backlog). When these things are resolved - sure. But I'd give it
some time.

EDIT: the memory issue was a mis-configuration on part. Apparently disque
comes with maxmemory of 1G by default, and I just exceeded it :)

~~~
bontoJR
> _when it stabilizes - sure_

Thanks. I think I will definitely try to write a push server on it, maybe it's
something that can help to stabilize the system and it can also act as a good
test base.

~~~
dvirsky
Cool. If you do write it in Go feel free to use my library and I'll help you
if you have any issues with it. I didn't really test it in real world
deployment, so it probably hides some issues as well.

