
NSQ – A realtime distributed messaging platform designed to operate at scale - loppers92
https://github.com/nsqio/nsq
======
rjeli
Segment is probably the biggest NSQ user right now, and they're moving to
Kafka - any employees want to weigh in? :)

~~~
TheHydroImpulse
Engineer @ Segment

NSQ has served us pretty well but long term persistence has been a massive
concern to us. If any of our NSQ nodes go down it's a big problem.

Kafka has been far more complicated to operate in production and developing
against it requires more thought than NSQ (where you can just consume from a
topic/channel, ack the message and be done). More to that, if you want more
capacity you can just scale up your services and be done. With Kafka we had to
plan how many partitions we needed and autoscaling has become a bit trickier.

We now have critical services running against Kafka and started moving our
whole pipeline to it as well. It's a slow process but we're getting there.

We've had to build some tooling to operate Kafka and ramp up everyone else on
how to use it. To be fair, we've also had to build tooling for NSQ,
specifically nsq-lookup to allow us to scale up.

We have an nsq-go library that we use in production along with some tooling:
[https://github.com/segmentio/nsq-go](https://github.com/segmentio/nsq-go)

~~~
doh
Have you ever looked at any proprietary solutions like Google's PubSub? We're
running on PubSub for over year now and outside of some unplanned downtimes
it's scaling very well. But as we're looking to branch out out of GCP we are
looking at Kafka as an alternative.

Could you comment on particular problems and challenges that you ran into?

For the context, we're currently sending around 60k messages/sec and around 1k
of them contains data larger than 10kb.

~~~
TheHydroImpulse
The biggest issue with PubSub and Amazon's alternative is the cost. Being
capped at a per-message cost would be a no go.

If you can get away with using PubSub or the like it would be far easier than
to manage your own Kafka deployment (correctly).

If data loss is unacceptable then Kafka is basically the only open-source
solution that is known for not losing data (if done correctly of course). NSQ
was great but lacked durability and replication. We can guarantee that two or
more Kafka brokers persisted the message before moving on. With NSQ, if one of
our instances died it was a big problem.

Managing Kafka in a cloud environment hasn't been easy and required a lot of
investment and we have yet to move everything over to it.

~~~
ngrilly
> If data loss is unacceptable then Kafka is basically the only open-source
> solution that is known for not losing data

What about RabbitMQ?

~~~
doh
We used RabbitMQ extensively for almost two years but the problems we were
encountering along the way weren't worth it. We ended up talking to the dev
team too often to solve catastrophic issues that took down our whole
production for hours.

We reconsidered using it again for a synchronous RPC communication as we were
replacing gRPC, but ended up going with nats.io instead. It does have less
fearures but we are able to squeze much more juice on a smaller stack.

~~~
derekperkins
Why were you replacing gRPC?

~~~
doh
gRPC is a great, but has a ton of small problems including a catastrophic
documentation (in some cases we had to read byte code to figure out what to
do).

The biggest issue for us was however was that there is no middle server that
could handle an route connections to available workers. We were using haproxy
which worked ok but far from great. It was very hard to figure out how many
servers need to run at any given point and thus a ton of our requests were
ending up with UNAVAILABLE response.

Essentially what we needed is a synchronous RPC over PubSub which gRPC doesn't
offer.

------
matticakes
I'm one of the original authors, happy to answer any questions.

------
agentultra
There is quite a bit of documentation on the design but I haven't seen
anything more specific along the lines of a TLA+, Lean, etc specification.

There are plenty of projects like this and I'm curious how they go about
creating specifications, checking their designs, etc.

Would the project benefit from a formal model or proofs? A colleague and I
started a side project to provide specifications for core Openstack components
but we're keeping our minds open to other projects as well.

------
est
The nsqadmin was written in backbone, it also requires statsd and graphite
which are kinda obsolete these days.

------
je42
what are the typical use cases for NSQ ?

~~~
ryan_lane
A use case that I really like is as a sidecar on every EC2 instance for local
async store-and-forward, where you need to deliver data somewhere, but want to
be able to handle large bursts of traffic without a massive spike in latency.

This use case assumes you generally don't trust the network and really, really
don't want to block if the network is temporarily flaky.

