

NSQ: Realtime distributed message processing at scale (in Go) - matticakes
https://github.com/bitly/nsq

======
jallmann
Was something like MQTT investigated prior to re-inventing the wheel with NSQ?
From a cursory examination, I don't see any major advantages other than nice
UI tools.

The wire protocol seems more complicated -- why distinct protocols for
producers and consumers? Linebreak-delimited headers are error-prone and ought
to be banished. Why transmit the hostname with sub requests -- is another
system requesting jobs on behalf of workers?

Then the topic/channel distinction seems artificial when wildcards would
suffice (and provide much more flexibility), eg topic/*/channels or
topic/channels/# in the MQTT parlance. MQTT also has more fine-grained
delivery guarantees via its QoS levels. All this with a header structure
that's as small as two bytes.

edit: While MQTT is pubsub, as long as the system is under your control,
you're free to change the semantics at the broker side from "broadcast
messages to all consumers" to "rotate messages among consumers."

~~~
matticakes
one of the developers here...

The protocols that exist in NSQ now are designed to be the simplest
implementation that worked.

You've correctly pointed out some of the issues. At this stage, the
distinction of producer vs consumer was mostly so that you could publish _at
all_ without having to use the HTTP interface. For our use cases, in
particular taking advantage of the /mput endpoint, we aren't even using the
TCP based publishing protocol.

Re: your point on sending metadata with SUB commands. I agree its a bit ugly.
We actually intend on improving that aspect by instead sending the data in the
form of an IDENTIFY type command upon initial connection. That information is
used in the various administrative UIs and endpoints.

I'm going to do a bit more reading on MQTT, thanks for the link.

~~~
old_sound
I'm really curious to know why would you build this tool instead of using,
say, RabbitMQ?

DISCLAIMER: I wrote the RabbitMQ in Action book.

~~~
matticakes
We looked into AMQP based solutions. Our understanding is that slaving,
master-slave failover, or other strategies are used to _mitigate_ the fact
that there is a broker responsible for a given stream of messages.

We wanted a brokerless, decentralized, and distributed system. NSQ has no
broker or SPOF so it is able to route around failures.

That said, I think RabbitMQ is a good tool depending on your requirements. I
can imagine a broker proving useful in situations where you may want strict
ordering or no-duplication. Those were tradeoffs we were willing to make for
fault tolerance and availability.

Also, given the fact that we were already operating infrastructure built on
simplequeue (which is also distributed and brokerless) we found it more
appealing to evolve rather than replace.

~~~
rabbitmq
Master-slave etc are HA techniques for replicating state eg queues (or
"channels" as nsq calls them?). "Decentralized and distributed" has nothing to
do with state, master/slave or any of that; it's a topological property of
your system. You can certainly build a decentralized and distributed system
using a set of brokers, regardless of whether they support HA.

~~~
matticakes
That's fair, you certainly can use rabbitmq in a distributed and decentralized
fashion.

You bring up an important point though... What is important is the topology
we're promoting and the platform we've built to support it. The actual queue
is less important and it made sense for us to own that piece to achieve our
goals.

~~~
rabbitmq
So, why not just deliver that topology using off the shelf components? Rabbit,
Kafka, ZeroMQ, ... the toolbox is deep. This is not meant as a criticism, I am
simply trying to understand what I am missing when I look at your design :-)

------
matan_a
We're currently using Apache Kafka which is working out great, but does
include a broker as a middleman. This is good for us since it's designed to
accumulate unconsumed data (or even roll it back) and still provide good
performance - rather than having that happen on the producers or consumers.

It would be cool to see some stats on throughput and latency relative to # of
producers / consumers and amount of data currently accumulated in the
producers (since there is no middleman).

------
bgentry
_This ensures that the only edge case that would result in message loss is an
unclean shutdown of an nsqd process. In that case, any messages that were in
memory (or any buffered writes not flushed to disk) would be lost._

I don't understand how you can call it a message delivery "guarantee" when
you're susceptible to losing messages when a node dies.

 _One solution is to stand up redundant nsqd pairs (on separate hosts) that
receive copies of the same portion of messages._

OK, delivery is only guaranteed if I run multiple independent sets of NSQd and
write messages to both.

Regardless, it looks like an interesting project.

~~~
jlgreco
How would you design a system that didn't have this problem at the individual
node level?

~~~
dclusin
Sequence numbers on messages sent by producers coupled with some sort of
persistence by the sender. This way when a gap is detected some sort of resend
protocol can be implemented to achieve the delivery guarantee.

~~~
jlgreco
So the sender just keeps trying until it gets an ACK. Fair enough, though I
imagine you'd want to run multiple nodes even with this.

~~~
dclusin
Correct. Reliability and Availability are two separate but related goals.

------
wheaties
I love seeing how people tackle these sorts of problems. Here, the trade-off
is duplicate messages reaching a client. I know others have tried a no-"ack"
approach. I'm still working in the land of RabbitMQ solving all my business
needs and I like it.

I'd love to see some numbers, reasons for decisions made, and suggested best
practices for this solution other than "manual de-dupe."

~~~
jehiah
Some of the background behind our decisions are on our blog post
<http://word.bitly.com/post/33232969144/nsq>

In terms of manual de-duping, we strive internally for idempotent message
processing so it's fairly irrelevant, but to handle cases where it matters,
all of our messages have unique id's added to them outside of NSQ.

The actual cases where messages are handled multiple times is limited to when
a client disappeared during message processing (a hard restart) or it passed
the allowable time window to respond and was given to another client.

If there are any specific numbers you are curious about, please ask.

~~~
old_sound
When people say "billions of messages per day" do they realize that 1 billion
messages =~ 11500 msgs/sec ? RabbitMQ as well as any self respecting message
broker is able to handle that on a single machine

------
jordanlewis
It's kind of like Storm. I'd love to see a comparison written by the authors.

~~~
matticakes
Agreed, a comparison would be helpful for highlighting the tradeoffs behind
the choices we made.

We do talk about the evolution of our infrastructure and the genesis of NSQ in
our blog post, <http://word.bitly.com/post/33232969144/nsq>.

------
jasonmoo
Hey it's awesome to see you guys tackling a larger project in Go. I've been
writing some smaller services/tools with it in my day job, and I like it a
lot.

Thanks for sharing your work.

------
dikbrouwer
How does this compare to ZeroMQ?

~~~
jeremyjh
ZeroMQ is a library, it is embedded in the application code and can support a
peer-to-peer messaging topology. It also has support for Broker patterns but
you have to develop the Broker application yourself; you could build a
solution such as this one using it, but the zmq library itself really handles
lower-level concerns with a high degree of flexibility but no out-of-the-box
functionality. zmq can be thought of as something as simple as an improved
socket library but with support for async, storing & high-water marks when
there are no attached consumers, various distribution strategies, queues and
lots more.

~~~
dikbrouwer
Right, I guess my question more specifically is: ZeroMQ allows you to built
applications like this in (seemingly) just a few lines of code. Did you guys
consider building it on top of ZeroMQ, and if so, what made your decide to not
use it? Nothing against what you've built btw, it looks great, but am just
curious

~~~
matticakes
The truth is we wanted to use ZeroMQ initially. It would have been really
awesome to have all that flexibility on the client side.

The ZeroMQ documentation is fantastic as well. It inspired and helped shape
some of the design choices we made.

The further along we got from design to implementation it became obvious that
it would be important to "own" the socket. Generally speaking, this is exactly
what ZeroMQ prevents you from doing (and rightly so, it aims to abstract all
of that away).

The choice to use Go had an impact here as well. Language features like
channels and the breadth of the standard library made it really easy to
translate our NSQ design into working code, offsetting the benefit of ZeroMQ's
abstractions.

~~~
dikbrouwer
Makes a lot of sense, thanks for sharing!

------
overbroad
I'm too lazy to install Go just to try this (I would need to, right?) but
reading through the docs this looks far more appealing to me than tent.io.
Nice work, at least with respect to the general design. Flexible.

~~~
jehiah
Thanks for the reminder. Getting binary builds up is on our to-do list, so
check back soon for those.

In the mean time, installing Go is pretty easy (you can use brew, or the
official OSX package (assuming you are on OSX)
<http://golang.org/doc/install>) and we've tried to leave clear steps for
building NSQ here <https://github.com/bitly/nsq/blob/master/INSTALLING.md>
(there really are very few dependencies other than go itself)

~~~
overbroad
Cheers. I have installed Go before and messed around with it. I agree it's
pretty good as far as being self-contained and straightforward. I just don't
have the energy to keep chasing a moving target. For my purposes, I'm more
interested in stuff that is stable and rarely changes.

If you can produce binaries for UNIX (Linux, BSD, etc), OSX and Windows, I'll
be impressed. That's something I was interested in doing with Go (is it
possible to cross-compile?) but never managed to learn.

~~~
4ad
Since the Go1 release, Go is not a moving target anymore. Stable releases are
rare and there are binary packages available if you are so inclined.

Go compilers, being derived from Plan 9, are always cross compiling. To cross
compile you just set and export GOARCH and GOOS to your target, for example:

    
    
      GOOS=windows GOARCH=386 CGO_ENABLED=0 go build foo.bar/baz
    

Would build foo.bar/baz for 32 bit Windows from any system that has Go and ran
make.bash --no-clean with those variables set. More interesting is building
for ARM:

    
    
      GOOS=linux GOARCH=arm CGO_ENABLED=0 go build foo.bar/baz

------
arjn
We use ActiveMQ at work and are investigating other MQs too. I dont see any
info or performance graphs w.r.t message size. That would be a useful
addition.

~~~
jehiah
Good idea, we will try to put together some numbers with regards to message
size.

------
interg12
Anyone at Disqus care to weigh in on this?

------
andreypopp
Interesting if bitly guys had a look at beanstalkd before implementing nsqd.

~~~
matticakes
beanstalkd is a great project and it certainly inspired some of our thinking.
I think of beanstalkd as a more fully functional alternative to simplequeue
(what we built/used prior to NSQ - more details in our blog post
<http://word.bitly.com/post/33232969144/nsq>).

Some of the goals of NSQ transcended just replacing our specific daemon that
buffered and delivered messages (most importantly the interactions with the
lookup service). Because of that, we felt that owning that piece would make it
easier to achieve those goals.

Additionally, one of the most important properties of nsqd (the queue
component of NSQ) is that data is pushed to the client rather than polled
(like in beanstalkd).

------
stevvooe
Where is the guy with the comment about generics? Disappointing.

------
knodi
Does this have gob interface?

~~~
matticakes
it is data format agnostic so you can certainly serialize using the gob
package.

~~~
knodi
Amazing, with this my whole infrastructure will be 100% Go.

