
An introduction to RabbitMQ - olikas
https://www.erlang-solutions.com/blog/an-introduction-to-rabbitmq-what-is-rabbitmq.html
======
kabacha
I'm really surprised to see so much positivity about rabbitmq here when it's
probably the most sweared-at software in the space.

Let me share my anecdote. In my last work place I got onboarded on rabbitmq
and it was such a painful software to work with and almost impossible to set
up locally that I silently sneaked in simple redis list as queue alternative
for my dev environment. The whole rabbitmq and it's pika library was replaced
by 3 lines of python and redis server.

One day rabbitmq died and it tooks sys admins few weeks to get it back
running. In that time I deployed my simple redis list and never looked back.
To this day redis solution works without any friction whatsoever with fraction
of resources.

The rabbits AMQP exchange model is severely flawed and convoluted. It's the
worst example of corporate software where everything works and doesn't work at
the same time.

I wouldn't recommend rabbitmq to my worst enemy yet there's still something
attractive about it. Maybe there's a sane alternative? Maybe zeromq?

~~~
st1ck
Does your solution handle the situation when consumer crashes and queue has to
be accumulated (while RAM allows) until consumer is up again (maybe 1 day
later)? AFAIK ZMQ can't guarantee this.

~~~
twic
It must be emphasised that, despite the name, ZeroMQ is not a message queue.
It is a networking library. The current blurb says:

> ZeroMQ (also known as ØMQ, 0MQ, or zmq) looks like an embeddable networking
> library but acts like a concurrency framework.

And the old meme was "ZeroMQ is a replacement for Berkeley sockets".

It's a pretty cool networking library. But it makes no sense to think of it in
the same slot as RabbitMQ, or even Redis.

That the GP mentioned it does make me wonder if they don't really understand
what they're doing.

EDIT I love that the official guide actually has this diagram in! Pieter
Hintjens's death was a sad loss. [http://zguide.zeromq.org/page:all#How-It-
Began](http://zguide.zeromq.org/page:all#How-It-Began)

~~~
jen20
As an aside, the ZeroMQ Guide is the finest piece of technical writing I have
ever seen - closely followed by Pieter’s (sadly unfinished) “Scalable C” book.
His non-technical writing is also excellent.

~~~
francislavoie
I had a fun time implementing the Paranoid Pirate pattern with my coworker a
few years back. He wrote the server part in Python, I wrote the client in PHP.
We essentially built it as a wrapper to run some C code our boss wrote that we
didn't want to write a PHP extension for - we used Python as a broker to allow
for some concurrency. Worked super well.

------
webscalist
RabbitMQ has huge learning curve if you're trying to build a worker queue.

First, you'll learn about ack/noack and get the worker ack on success.

Then, you'll learn about dead letter queue ... etc for delayed retries.

Now, you'll have a topic exchange and a bit hairy routing in place using
wildcards.

And you mistakenly set dead letter routing key so that expired messages end up
in multiple queues (retry queues and actual worker queue ... ).

Then you rewrite your service in python and use Celery or something.

It's nearly impossible to get RabbitMQ working correctly within few months.

And I forgot about HA. Paying for hosted RabbitMQ might be better. But
CloudAMQP in particular could be tricky as well. It can run out of AWS IOPS
and your production gets hosed.

Also setting up monitoring on queue health, shoveling error queues ... etc
take time to learn and apply. Be careful about routing keys when you shovel
error queue to a topic.

~~~
symplee
Can anyone recommend an easier alternative?

~~~
mcsoft
[http://nats.io](http://nats.io)

~~~
ies7
I heard a lot of praise about Nats, but isn't it more like a kafka
alternative? Someone new need to spend sometime grasping the stream concept.

~~~
jonathanoliver
NATS by itself is designed to be more of an always-on style queuing system
(the term they use is "dial tone") but doesn't handle node failures by itself.
If you're looking for a Kafka-flavored NATS, there's a new release I saw
recently called LiftBridge that adds some durability to the NATS protocol.

~~~
shaklee3
Someone mentioned this already below, but nats streaming also adds durability.

------
akyu
RabbitMQ is great. One of the few pieces of software I've used that "just
works".

The only downside is once you get message-queue-pilled, you start seeing
opportunities to refactor/redesign with message queues everywhere and it can
be hard to resist the urge. It really is remarkable how, when used
appropriately, message queues can dramatically simplify a system.

~~~
dkersten
How is it for production deployment? I was considering it for something
recently, but got overwhelmed by the documentation on setting up a fault-
tolerant production deployment, so have been avoiding it. Was this an
overreaction? What is your experience with that?

Also, do you happen to know how well it works in a fault-tolerant way for
communicating between services that are in different data centers?

My main use-case is to receive status/change notifications from a service
running elsewhere from the API server servicing the UI, in order to avoid
polling for new data.

~~~
zonk_
We use it in a fairly big scale for our slack bot system. It was just set up
once, as akyu said, and since then it just works. Whenever we had troubles, it
was always anything other than RabbitMQ.

I've also looked into other solutions (ActiveMQ, Google PubSub, ...) and
RabbitMQ is by far the most straight-forward and quick to set up. There are
some edge cases that it doesn't cover as well, for example automatic retries,
but there are some "RabbitMQ patterns" to make it work. For a simple message
broker/queue system, it's great and the docs are also great.

~~~
dkersten
Was it easy to setup in terms of reliability and failover?

Given what you and larrik are saying, I think I need to give it a trial run,
but its a project with a tiny team, so I want to be sure it won't be the cause
of sleepless nights when things go wrong. It sounds like RabbitMQ is quite
solid and shouldn't be the cause for concern, which is promising!

Is there anything I should keep in mind for running it in production? Any best
practices or gotchas, based on your experience (eg don't run in docker, or
make sure there's lots of RAM or things like that)? I guess its all in the
production checklist. I need to read through it all again!

~~~
ekimekim
This came up in several other threads here: Don't use RabbitMQ's clustering.
It's surprisingly brittle and hard to recover from.

The accepted wisdom that I've seen is to run a single broker with a completely
independent hot spare. But of course switching over to your hot spare will
violate most of the guarentees that Rabbit gives you around durability,
ordering etc, so you have to be very careful how you use it.

I desperately want to like Rabbit (and have used it heavily in the past) but
right now I wouldn't use it if I can get away with anything else, it just has
no real HA story.

~~~
jackvanlightly
Rabbit dev here. We released quorum queues a few months ago. It's a Raft based
replicated queue that addresses all the old problems.
[https://www.rabbitmq.com/blog/2020/04/20/rabbitmq-gets-an-
ha...](https://www.rabbitmq.com/blog/2020/04/20/rabbitmq-gets-an-ha-upgrade/)

~~~
sciurus
Thanks for all your work on RabbitMQ, and for your great blog posts about it
and other messaging systems.

For anyone who wants to understand the potential complexities of HA RabbitMQ,
spend some time reading [https://jack-vanlightly.com/blog/2018/8/31/rabbitmq-
vs-kafka...](https://jack-vanlightly.com/blog/2018/8/31/rabbitmq-vs-kafka-
part-5-fault-tolerance-and-high-availability-with-rabbitmq)

------
adamkf
I've had truly terrible experiences with RabbitMQ. I believe that it should
not be used in any application where message loss is not acceptable. Its two
big problems are that it cannot tolerate network partitions (reason enough to
never use it in production systems, see
[https://twitter.com/antifuchs/status/735628465924243456](https://twitter.com/antifuchs/status/735628465924243456)),
and it provides no backpressure to producers when it starts running out of
memory.

In my last job, we used Rabbit to move about 15k messages per sec across about
2000 queues with 200 producers (which produced to all queues) and 2000
consumers (which each read from their own queues). Any time any of the
consumers would slow down of fail, rabbit would run out of memory and crash,
causing sitewide failure.

Additionally, Rabbit would invent network partitions out of thin air, which
would cause it to lose messages, as when partitions are healed, all messages
on an arbitrarily chosen side of the partition are discarded. (See
[https://aphyr.com/posts/315-jepsen-
rabbitmq](https://aphyr.com/posts/315-jepsen-rabbitmq) for more details about
Rabbit's issues and some recommendations for running Rabbit, which sound worse
than just using something else to me.)

We experimented with "high availability" mode, which caused the cluster to
crash more frequently and lose more messages, "durability", which caused the
cluster to crash more frequently and lose more messages, and trying to
colocate all of our Rabbit nodes on the same rack (which did not fix the
constant partitions, and caused us to totally fail when this rack lost power,
as you'd expect.)

These are not theoretical problems. At one point, I spent an entire night
fighting with this stupid thing alongside 4 other competent infrastructure
engineers. The only long term solution that we found was to completely
deprecate our use of Rabbit and use Kafka instead.

To anyone considering Rabbit, please reconsider! If you're OK with losing
messages, then simply making an asynchronous fire-and-forget RPC directly to
the relevant consumers may be a better solution for you, since at least there
isn't more infrastructure to maintain.

~~~
tmarice
Rabbitmq blocks producers when it hits memory high watermark (default 40% of
available RAM) -
[https://www.rabbitmq.com/memory.html](https://www.rabbitmq.com/memory.html)

------
wegs
My general problem is that it's really hard to figure out which architecture
is right for which system.

There's a different architecture for:

* one queue with billions of messages

* a millions of queues with small numbers of messages per queue

* many queues with many messages per queue

There are also different topologies:

* Anyone can send a message to anyone (O(n^2) queues)

* One publisher with millions of subscribers

* One subscribed with millions of publishers

* Complex processing networks, where messages get routed in complex ways between processing nodes.

There are differences in timing:

* More-or-less instant push notifications

* Jobs which run within e.g. 5 minutes with polling

* Jobs which run in hours/days, with a cron-style architecture

And in reliability:

* Messages get delivered 100% of the time, and archived once delivered

* Messages get delivered 99.999% of the time, but might be dropped on a system outage

* ... all the way down to ephemeral pub-subs

... and so on.

I'd give my VP's right eye to get a nice chart of what supports what. For the
most part, I've found build to be cheaper than buy due to lack of benchmarks
and documentation for my use cases. Otherwise, you build. You benchmark. You
optimize. And things melt down.

My use case right now requires a large number of queues (eventually millions).
I'd like to have an archival record of messages. Peak volume is moderate
(several messages per second per queue), but usage patterns are sporadic (most
queues are idle most of the time). Routing is slightly complex but not supper-
complex (typically, about 30 sources per sink, at most 200; most sources only
go to one sink, but might go to 2-3). Messages are relatively small
(typically, around 1k), but isolated messages might be much bigger (still
<1MB, but not small).

My experience has been that when I throw something like that into pick-your-
queue/pub-sub, things melt down at some point, and building representative
benchmarks is a ton of work.

~~~
robbyt
All software breaks at some point. If you're dealing with this scale of load,
it's mandatory to perform synthetic load testing to validate, otherwise you're
just guessing what the breakage threshold will be.

------
tombert
I feel like RabbitMQ is sort of the "swiss army knife" of message queues, and
I mean that in the nicest way possible.

People will compare it to Kafka, claiming that its pubsub is faster than
Rabbit's, but that's sort of missing the point: Rabbit thrives because it's
easy to set up, will work well for 99% of cases, and handles nearly _every_
kind of distributed problem you're likely to come across.

I recently did a project with Rabbit on my home server, and while the project
had some issues, the issues were _never_ Rabbit.

~~~
wenc
Rabbit doesn't have Kafka's ability to massively distribute and scale (it does
have a distributed story but from what I hear few explore it). But Rabbit also
supports more complex use cases than Kafka because its messaging protocol
(AMQP) is more intelligent. Unless you're a "web-scale"/s company, Rabbit's
scale even on one node is likely enough.

I've been using Rabbit in production for RPC and pub/sub for the past 5 years
(single instance running on a non-dedicated VM, medium traffic) and its been
pretty easy to setup and has been pretty reliable in practice.

I've always been concerned about losing messages, and I did have to learn to
turn on persistence and durability for messages to survive server
interruptions, but it was easy enough. Message acknowledgements are also a
nice feature, and Rabbit is able to achieve at-least-once messaging semantics.

~~~
tombert
Yeah, I don't dispute that for certain usecases, Kafka is definitely the
better choice, use the right tool for the right job.

That said, for most small to medium-large tasks, Rabbit will handle things
without much trouble, making it a good fit for most common usecases.

------
hnrodey
I'm very well versed on RabbitMQ. We use it internally in a .NET codebase.

Anyone considering RabbitMQ needs to read up on "network partitions", how to
build your cluster to avoid them (odd number of nodes and pause_minority),
your recovery strategy for when a network partition occurs (it will occur),
your personal/organizational tolerance for message loss and a plan for how you
will upgrade your cluster at some later date (ensure you architect your
application to handle whatever type of upgrade strategy you will pursue).

There are definitely ways to operate to minimize these failures but you SHOULD
KNOW ABOUT THEM before your add this service to your environments.

~~~
fmorel
If you're using RabbitMQ on .Net, I highly recommend using NserviceBus. It's
made working with queues _so easy_. It handles maintaining a connection and
retrying/acknowledging messages for you.

~~~
hnrodey
Hindsight is the best site. That's definitely what I would do if I was
starting a new project using RabbitMQ. Although I'll defend myself on this
front; I inherited our RabbitMQ project from the developer who left the
company 7/8 of the way through the implementation. I had the "make it work"
directive and not the decision making luxury he had from the beginning.

------
crad
Using the opportunity to pimp my book, RabbitMQ in Depth:
[https://www.manning.com/books/rabbitmq-in-
depth](https://www.manning.com/books/rabbitmq-in-depth)

:)

~~~
LeonM
So weird seeing you post that, as I literally have this book on my desk right
now.

Thanks Gavin, I learned a lot from reading it!

~~~
crad
That's awesome! I'm glad it was useful!

------
davidcorbin
RabbitMQ has been awesome in my experience. One of the few tools that just
works and has a super useful management web interface and Prometheus support
among other plugins.

For those noting HA and scalability, it not meant for those use cases where
(virtually infinite) horizontal scalability are the biggest concern. If you
need horizontal scalability at a massive scale, use Kafka. But for the
majority of cases, you can get away with limited scalability and the prod
setup, development experience, and reliability of RabbitMQ are unmatched from
my experience.

------
halfmatthalfcat
I've been trying to rationalize using either RabbitMQ or Kafka for something
I'm building. High messages per second but with more complex routing
topologies.

Rabbit seems to be the right path but I'm worried about scaling out as many
sources seem to point as Kafka being more scalable (at least horizontally).
I've been looking into Rabbit's Federation but it's still not clear if that
will solve the problem down the road.

Can anyone shine some light?

~~~
atombender
Adding to what the sibling comment say, be careful about buying into
RabbitMQ's clustering; having run it for years, I found it to be extremely
brittle.

We often lost entire queues because a small network blip caused RabbitMQ to
think there was a network partition, and when the other nodes became visible,
RabbitMQ has no reliable way to restore its state to what it was. It has a
bunch of hacks to mitigate this, but they don't solve the core problem; the
only way to run mirrored queues ("classic mirrored queues", as they're not
called) reliably is to disable automatic recovery, and then you have to
manually repair RabbitMQ every time this happens. If you care about integrity,
you can use the new quorum queues instead, which use a Raft-based consensus
system, but they lack a lot of the features of the "classic" queues. No
message priorities, for example.

I've never used federation or Shovel, which are different features with other
pros/cons.

If you're willing to lose the occasional message under very high load, NATS
[3] is absolutely fantastic, and extremely fast and easy to cluster.
Alternatively, NATS Streaming [4] and Liftbridge [5] are two message brokers
built on top of NATS that implement reliable delivery. I've not used them, but
heard good things.

[1]
[https://www.rabbitmq.com/partitions.html](https://www.rabbitmq.com/partitions.html)

[2] [https://www.rabbitmq.com/quorum-
queues.html](https://www.rabbitmq.com/quorum-queues.html)

[3] [https://nats.io/](https://nats.io/)

[4] [https://docs.nats.io/nats-streaming-
concepts/intro](https://docs.nats.io/nats-streaming-concepts/intro)

[5] [https://github.com/liftbridge-
io/liftbridge](https://github.com/liftbridge-io/liftbridge)

~~~
shoo
> lost entire queues because a small network blip caused RabbitMQ to think
> there was a network partition, and when the other nodes became visible,
> RabbitMQ has no reliable way to restore its state to what it was

I can offer a similar anecdote: we started seeing rabbitmq reporting alleged
cluster partitions in production after enabling TLS between rabbitmq nodes,
where manual recovery was needed each time.

After a bit of investigation we noticed that cluster partition seemed to
correlate with sending an unusually large message (think something dumb like
30 megs) through rabbitmq when TLS between rabbitmq nodes was enabled. What I
believe was happening was Rabbitmq was so busy encrypting/decrypting large
message that it delayed sending or receiving heartbeat & then the cluster
falsely assumed there has been a network partition.

Mitigated that issue by rewriting system to not send 30 meg messages- there
was only one message producer that sent messages anywhere near that large, and
after a bit of thought realised it was not necessary to send any message at
all in that case (sending large message was to hack around some other old
system performance problem that had gotten fixed properly a year back, but the
hack that generated a huge message was still in place)

~~~
ramchip
Erlang/OTP-22 (released last year) introduced TLS distribution optimizations
and message fragmentation which sound very related to the problem you saw:

[http://blog.erlang.org/OTP-22-Highlights/](http://blog.erlang.org/OTP-22-Highlights/)

The fragmentation in particular addresses the problem where a large message
would block all other messages, including heartbeats, and cause nodes to look
“down” when they’re not.

~~~
shoo
fantastic. thank you for sharing that -- my anecdote about this problem is
slightly dated -- it would have been late 2017 early 2018 we were seeing the
issue, which indeed predates OTP 22 release.

------
mc_
We've used RabbitMQ since 2010 in KAZOO. I would argue, save one or two
instances in the intervening 10 years, that RabbitMQ is the most stable piece
of the infrastructure. I think it might be the only open-source project we
build on that we haven't committed upstream to because we haven't encountered
any issues in our usage.

------
harel
RabbitMQ is one of those pieces of software I usually forget are there. I
can't remember having to deal with any rabbit issue in last few years.

~~~
rhodin
This might be the Achilles heel of RabbitMQ. It works so well that people
forget it for years, and then they have forgotten how to upgrade it, etc. :)

~~~
rawoke083600
Lol this ! WRITE down that rabbitmq-web-admin passwd. After the setup and
first few weeks of checking the speed of your queues you will forget about it
and try to login in 1 year later :)

------
eqmvii
We started using RabbitMQ for several projects last year, and it's been a joy.

Some of that joy is surely just moving from older, creakier solutions. But it
hasn't let us down, and everyone is eager to use it for new features or
refactoring legacy code.

------
bvm
Using this opportunity to shout out to Rascal
([https://github.com/guidesmiths/rascal](https://github.com/guidesmiths/rascal))
which makes using RabbitMQ on Node an absolute joy.

~~~
pc86
Same with MassTransit[0] and .NET. We have several distributed .NET Core
services running in our data center, services running on employee PCs, etc all
communicating via RMQ with MassTransit and it's great. The primary maintainer
is very active (streams every Thursday evening) and the documentation has gone
from "pretty bad" to really good in the last few months.

[0] [https://masstransit-project.com/](https://masstransit-project.com/)

~~~
jonathanoliver
MassTransit is awesome! I love what Chris Patterson (the author) did. It
essentially allows you to swap out RabbitMQ for SQS or Azure Service Bus or a
few others. Pretty cool stuff if you're in .NET land.

------
rexarex
I have never had a good experience with RabbitMQ wherever I have worked. Often
it was buggy and unreliable. It’s almost always been some thing shoehorned
into a service, but failed to gain widespread adoption with future services.
Furthermore, it’s usually some hot potato no one even wants to deal with. We
have written some code around it to make it more reliable. You quickly figure
out why there seems to be so many half baked implementations of it wherever
you go work.

It’s basically caught between being too bloated and complex for use with
smaller systems (as some commenters have poked at people for not being the
‘right’ kind of person to be running it)

While at the same time, it’s not robust and reliable enough to use in prime
time.

What’s left is this enticing and sexy sounding message broker called RabbitMQ
that actually just sort of sucks.

In my experience someone gets stoked on trying this out but once everything is
all implemented it disappoints and the system or service it is apart of is a
one off after future services use something more mature the next time around.

For scale I have used NSQ to handle millions of message a second and then for
smaller scale AWS services like SQS can handle things much more reliably.

------
jugg1es
I love RabbitMQ but deploying/managing a cluster can be tricky. We had
problems with network partitioning and since we didn't really need a cluster
for performance reasons - only availability - we switched to a single node.

~~~
airfreak
Try the new quorum queues, they don't have those issues.

------
nicodjimenez
It's working well for us but we occasionally get blips where for very short
periods of time messages get "stuck" in between application code events on
different servers and we cannot figure out why. It's very rare. Maybe a burst
of 5 messages every 10 million messages.

Any ideas on how to even debug this type of thing? Help! We think it might be
a tcp connection failure but we have no idea.

~~~
sethammons
Tcpdump and wireshark?

------
pfarrell
One big thing I’ve appreciated about RabbitMQ is how well it separates
publishing, message routing, and subscription concerns. Plus it’s never been
the issue in any infrastructure I’ve encountered it.

------
niffydroid
We use AWS SQS and Rabbit. At our scale, SQS is easy peasy and we can wrapper
it to make http calls instead of using SQS, as we're using AWS Beanstalk
workers. SQS is generally quicker to get up and running with and we can have
metrics out the box. With rabbit we use it for some other stuff and it works
just fine, it's when things go into a black hole we struggle, but that's our
lack of knowledge.

Depending on your scale, we find SQS is cheaper than a managed rabbit service.
Although I'd be interested in using kafka!

------
spicyramen
Use RabbitMQ for a call center handling thousands of calls per second. It
worked fine integrated with Flower, Celery and Python...but once we went
production, became a black box which every setting was hard to find
documentation or support, we ended up having to build huge Machines with tons
of memory and CPU and still saw messages lost no explanation. Ended up moving
to PubSub and rebuild the whole app

~~~
vorpalhex
Rabbit is not a "turn it on and hope it works" kind of solution and if it's a
blackbox to you then you shouldn't use it. AMQP is a relatively fancy protocol
and Rabbit is endlessly configurable which is both a pro and a con. You will
need to develop expertise in Rabbit to use it well at scale.

------
fake-name
I've been using rabbitmq heavily for a fairly large hobby project (20-100
messages/sec) for a few years now. I'm generally happy with it, but there are
a number of caveats I've learnt.

1\. If you have large messages and use keepalives (and you'll need
keepalives), you need to write your own message fragmentation.

2\. There are no python libs that just work. I'm currently using a vendored
version of amqpstorm with a bunch of hacks to handle wedged connections. I
have some AMQP connections that are intercontinental, and I've been able to
wedge literally every other AMQP library.

3\. If you have a single open connection, it _will_ get stuck from time-to-
time. With a bunch of both in-band and out-of-band keepalives, I've got it to
the point where I don't have things permanently block, but you should expect
things getting stuff for ~2x your heartbeat time periodically. This doesn't
seem to result in message loss. I've dealt with this by just running LOTS of
concurrent connections, and aggregating them client side. This has worked
fine.

4\. In general, exactly-once delivery isn't a thing. You should design either
for at-most-once, or at-least-once delivery modes exclusively. Idempotency is
your friend.

5\. The tooling /around/ the rabbitmq server is a dumpsterfire.

Basically, I feel like the core server is super durable (note: I'm not running
a cluster, so this doesn't generalize to multi-instance cases), but the
management stuff is god-awful. The main management CLI tool actually calls the
HTTP interface, which is kind of ridiculous. I've occationally run into a
situation where I wound up with leaking temporary exchanges, and just flushing
bogus exchanges is super annoying.

I don't think there's any other options that can do what rabbitmq does for my
use-case, but it's had quite the learning curve.

~~~
zbentley
> If you have large messages and use keepalives (and you'll need keepalives),
> you need to write your own message fragmentation.

I'm confused by what you mean by that. Do you mean "large" as in "take a long
time to process in the consumer"? If so, and if your consumer is not issuing
heartbeats concurrently with message processing, then that is true.

> There are no python libs that just work.

Completely agree. Having hacked on and patched the code inside Celery, it's
really quite a bummer. I think this is because the Python libs try to abstract
over things that ... just straight up can't be abstracted away given the
semantics of AMQP: specifically connection-drop-detection, "resumption" of a
consume (not really possible; this isn't Kafka), and the specific error code
classes (connection-closed vs channel-closed vs information).

> If you have a single open connection, it will get stuck from time-to-time.

Are you talking about publishing connections? Consuming connections? One used
for both? What does "stuck" mean? I'd be interested in hearing more about
this.

> exactly-once delivery isn't a thing

Kinda pedantic, but exactly once _delivery_ is possible in some very
restricted situations (see Kafka's implementation of this guarantee:
[https://www.confluent.io/blog/exactly-once-semantics-are-
pos...](https://www.confluent.io/blog/exactly-once-semantics-are-possible-
heres-how-apache-kafka-does-it/)). Exactly once _processing_ is what's tough-
née-impossible. So yeah, idempotence is great.

~~~
fake-name
> I'm confused by what you mean by that.

By large, I mean 10+ MByte.

> Completely agree. Having hacked on and patched the code inside Celery, it's
> really quite a bummer.

I don't understand what the point of celery is. Literally everything I do
requires /some/ persistent state in the workers, and there's no way to do that
with celery.

> Are you talking about publishing connections? Consuming connections? One
> used for both? What does "stuck" mean? I'd be interested in hearing more
> about this.

TCP connections. As in, a connection to the server from a consumer. High
latency connections seem to exacerbate the issue.

I think the issue is the state machines server-side and client-side get out of
sync, and things just stop until the keep-alives/heartbeat cause the
connection to reset, but that's a bunch of time to wait with no messages.

I also ran into the issue that basically every python library had at least one
or two locations where `read()` was called without a timeout, but that was at
least easier to fix.

> Kinda pedantic, but exactly once delivery is possible in some very
> restricted situations (see Kafka's implementation of this guarantee:
> [https://www.confluent.io/blog/exactly-once-semantics-are-
> pos...](https://www.confluent.io/blog/exactly-once-semantics-are-pos...)).
> Exactly once processing is what's tough-née-impossible. So yeah, idempotence
> is great.

Well, it isn't _really_ a thing, so you at least shouldn't depend on it being
a thing for your architecture if possible.

~~~
zbentley
> By large, I mean 10+ MByte.

OK. Did Rabbit or your client libraries bug out when sending single giant
messages? What does message fragmentation (by which I assume you mean
splitting one logical message up over multiple AMQP messages? Or something
else?) have to do with keepalives (and what do you mean by keepalives?
Connection heartbeats? TCP keepalives?)?

> Literally everything I do requires /some/ persistent state in the workers,
> and there's no way to do that with celery.

Sure there is. In-memory caches persist between requests. And there's always
sqlite and friends. Celery's more intended for the "RPC/fire-and-forget" case
than stateful workloads, but it's not too painful to use those with it. And
you get the benefits of its (reasonably) hardened connection/heartbeat
management, which may help with some of your other issues.

Basically every time I've seen code that rolled its own bespoke consumer loop
for RabbitMQ, it was wrong in some fundamental ways; the state machine on the
consumer side did indeed get out of whack, and badly. Best to outsource the
"keep the connection alive, establish subscription, detect failures" work to a
higher-level library (like Celery) that provides a long-lived consumer so your
code can just be occupied with data processing.

------
carterklein13
Would anyone be able to explain the benefits of RabbitMQ over NATS? As far as
I've seen, it's really just that RabbitMQ is more feature-rich, which I
personally feel like isn't that crucial, as frankly many systems are not going
to take advantage of those more complex functionalities anyway.

~~~
jonathanoliver
Durability. If you need to push messages that don't get lost, RabbitMQ is a
pretty solid choice. In years past the clustering situation wasn't great and
there was some potential for message lost and that seems to be resolved now
with quorum queues, but the biggest different between NATS and RMQ is the
durability guarantees and the at-least-once delivery guarantees that RMQ has.
NATS is more like ZeroMQ in that it expects the subscribers to be online.
There has been some work by others using that NATS protocol to create a Kafka-
like system (written in Go, I believe) called LiftBridge. So if you like NATS
and it's working for you and you want durability, take a look at LiftBridge.

~~~
shaklee3
This isn't true anymore. Nats streaming has persistence, so the OP's question
still remains

~~~
jonathanoliver
My understanding is that NATS (a protocol) and NATS streaming were related but
separate:

[https://github.com/nats-io/nats-site/issues/217](https://github.com/nats-
io/nats-site/issues/217)

(The issue is from 2017 but illustrates a distinction)

~~~
shaklee3
That's right, but I think at least since both are listed on their website as
different ways to run it that it should at least be considered a native
feature at this point.

~~~
carterklein13
All very interesting - this is great!

------
DubiousPusher
Rabbit saved my life. I had a project that involved getting the AMQP Proton
library working on the Xbox. Rabbit was so easy to setup and use, it gave me a
reliable way to test my work. Getting into AMQP at the time was confusing and
poorly documented. Rabbit did imdeed "just work".

------
posharma
How does RabbitMQ compare with Kafka?

~~~
Raidion
I'm more familiar with SQS than RabbitMQ, but have used both, and have chosen
between queue and stream based solutions.

Kafka is a stream, and can be replayed (if you have it set up to store stuff).
Rabbit is simply a queue, and when the messages are gone, they're gone.

This means that queues are a lot smaller, but can only serve one set of
consumers at at time. If you want to have multiple things listening to
messages, you have to use fan-out patterns that place messages on multiple
queues. Queues can also suffer from less than atomic delivery, especially if
the system is distributed. This means you have to jump through some hoops and
add an atomic layer somewhere if you want to ensure you're not double
processing anything.

Kafka can have infinite retention (if you got the storage/$), and you don't
need to have multiple streams to service multiple consumers. Each consumer
stores where they are in the stream, and can traverse as needed. You'll need
to be careful to make sure that a single consumer is handling a single
partition to promise that you'll only process a message once.

Managing streams can be a headache, but less so now if you have money to have
Amazon or Confluent manage it for you. They offer pretty much unlimited
scalability, and are the production grade solution for a ton of problems.

Queues are really simple to understand and build and still scale pretty dang
well. Just make sure your message processing is idempotent and make sure you
can handle if something is processed multiple times.

------
lukebakken
I highly recommend all of Jack's blog posts about RabbitMQ - [https://jack-
vanlightly.com/blog/tag/RabbitMQ](https://jack-
vanlightly.com/blog/tag/RabbitMQ)

Jack works with me on the RabbitMQ core engineering team. We've been hard at
work to address a lot of the issues brought up in comments here. It's worth it
to try out our latest releases. The engineering team is very active with the
community and takes all constructive, helpful (i.e. reproducible) feedback
seriously. Feedback is encouraged via the rabbitmq-users mailing list. Thanks.

------
major505
I really like RAbbitMQ. But I really dislike that database that it rellies
into, Mnesia. I had a client that because of licence issues could only do one
operation per time in the ERP software. So I used RabbitMQ to line the
requests, and do one at a time. Worked great ,was fast and low in resources.
But the place power supply was a problem, and more than once the place had a
blackout and when returning menesia messed up and lost the queues. So I ended
up just making my own simple queue using sqlite in the server.

------
shishy
Debated using RabbitMQ but decided the infrastructure overhead was too high.

Ended up looking into `rq` and `arq` which were both excellent!

[https://python-rq.org/](https://python-rq.org/)

[https://arq-docs.helpmanual.io/](https://arq-docs.helpmanual.io/)

Would recommend if you're looking for a (faster) worker queue without all the
overhead (in my case, didn't need all the other features that came w/ RabbitMQ
so this got the job done).

------
dirtydroog
We use ZeroMQ a bit. It's been pretty much flawless as far as I can see but I
get the impression that it's becoming obsolete. Is RabbitMQ a viable
replacement?

~~~
heinrichhartman
You might find it interesting to note, that Peter Hintjens, was one of the
core authors of the AMQP 0-9-1 Specification [1], that RabbitMQ is
implementing.

ZeroMQ was born out of a frustration with complex routing patterns and the
need for a broker-less architecture for maximal performance message delivery.

[1]
[https://www.rabbitmq.com/resources/specs/amqp0-9-1.pdf](https://www.rabbitmq.com/resources/specs/amqp0-9-1.pdf)

~~~
lulf
I'll add that the AMQP 1.0 spec (supported in Rabbit using a plugin) is a
peer-to-peer protocol that supports both the traditional broker use case,
'direct' p2p messaging and opens some interesting uses of message routers like
Apache Qpid Dispatch Router.

~~~
heinrichhartman
I am no export but I have heard PH say, that it's much worse than the AMQP-0.9
Spec. It's a design-by-comitte thing, where he was sidelined.

------
sum2000
We were looking into RabbitMQ but quickly retracted once we realized that it
does not support external OAuth2.0 providers in a straightforward way.

------
gchallen
I used RabbitMQ to distribute messages between components of a distributed
grading service that I wrote in Kotlin and deployed on Kubernetes.

My experiences were pretty mixed. Overall I found it to be more difficult than
I would have wanted to get simple things to work. Part of this seems to be a
problem with the Java library, which is not great. For example, IIRC you have
to be really careful not to create the same queue twice, even with identical
configurations, since the second time something blows up. At the end of the
day just a simple fan-out configuration ends up involving a lot of somewhat-
intricate code. It definitely does not Just Work (TM).

And then there was the bizarre hangs that I would experience during testing. I
set up a Docker Compose configuration so that I could test the various parts
of the system independently. It included one container running RabbitMQ to
simulate the cluster we have running on our cloud.

Usually tests ran fine. But then, from time to time, the client would just
hang trying to send a message through RabbitMQ. Unfortunately, again, the code
you need to just run a basic configuration using RabbitMQ is complex enough
that at first I was pretty sure that I had done something wrong. But after a
few hours of increasing frustration I finally broke down and discovered that a
simple test case that just sent a single message using code torn right out of
the docs would hang. Forever. (Or, long enough that I gave up waiting.)

After a lot of digging I found the culprit. RabbitMQ will just take its ball
and go home if the broker doesn't have enough disk space. Given that I use
Docker heavily for a lot of projects, the amount available to new containers
would vary a lot depending on what other data sets I had loaded or how
recently I had run docker system prune.

I filed an issue about this, asking to have a better error message displayed
when an attempt to send a message was made. The response was: there's already
an error message, printed during startup. You didn't see it? No. I must have
missed it among the hundreds of other lines of output that RabbitMQ spews when
it starts.

Overall my favorite part of this story is that RabbitMQ chooses to start but
refuse to send messages when low on disk space, when just crashing would be
much more useful and make it much easier to pinpoint what was going on.

Anyway, I'm in the market for a simpler alternative that's Kotlin friendly.

------
rawoke083600
Man I love this piece of software.. We used it as a bare-bones msg queue FIFO
and some FAN-OUT patterns. Basically only scratch the service of what is
possible. But this beast ran our ETL distributed update system at PriceCheck
(S.A largest price comparison service). Haven't worked there now for a few
years but back then the RabbitMQ was rock-solid for us !

------
fanna
A few years back I wrote a blogpost showcasing a nice use-case for RabbitMQ
and Elixir

[https://semaphoreci.com/blog/2017/03/07/making-mailing-
micro...](https://semaphoreci.com/blog/2017/03/07/making-mailing-microservice-
with-elixir-and-rabbitmq.html)

------
Dowwie
I've got a connection/channel question for those who have built solutions with
rabbitmq-- how did you decide as to how many connections and channels-per-
connection to use? Does connection pooling even make sense for RabbitMQ? My
impression is that channel pooling may make more sense. Thoughts?

~~~
Plugawy
An application usually has one connection, and many channels. Our pattern is
to dedicate one channel for all publishing and then N channels mapped to
consumer threads.

You don't have to pool connections as channels are multiplexed by them.

Things to watch out for:

\- opening too many channels - these map to Erlang processes and can overwhelm
your server if you go over ulimits \- sharing consumer channels between
threads - you might see weird behavior (e.g. acking wrong messages etc)

We've built own library/framework for creating resilient consumers, and it
enforces mapping 1:1 channels and consumer threads, as well as automatic
reconnections and channel clean ups.

~~~
jonathanoliver
+1 for everything that's been said. Another thing to consider is message
throughput, if that's a concern. In the case of multiple channels per single
connection, note that a connection is a single TCP connection such that
multiple channels contend for the TCP stream. At the same time, connections
aren't completely free either.

The general takeaway from this should be: if you've got a particular stream of
messages (either a producer or a consumer) that pushes many thousands or even
tens of thousands of messages per second, use a separate TCP connection. For
anything else that is slower (dozens of messages per second), multiple
channels on the same connection work great.

One last consideration is that when a given channel misbehaves or you perform
an operation that the broker doesn't like, the only recovery that I've seen is
to shut down the entire connection which can affect others channels on the
same connection.

------
FlorianRappl
Using it in multiple projects: The software itself is great and provides great
value.

Only pitfall are the available libs. Especially with the .NET implementation
we had quite a lot of trouble. Its not following current .NET patterns and has
strange quirks. Does anyone know a good alternative to the "official" one?

~~~
lukebakken
> Especially with the .NET implementation we had quite a lot of trouble. Its
> not following current .NET patterns and has strange quirks.

It would be great to get specific, actionable feedback with your experience,
either via a message to the rabbitmq-users mailing list or via a GitHub. The
.NET client is an old library but considerable effort into improvement went
into version 6.0. The plan for 7.0 is to address old patterns that remain in
the library. Feedback would help guide that effort.

I just released version 6.1.0-rc.1 and would appreciate testing if you have
time. Thanks!

~~~
jsmith45
The biggest issues are the public API surface.

If the library were being designed from scratch today, pretty much every
method on the model would be Async. After all, if it leads to any network I/O
of any kind, that can block.

Working with the current public API, Trying to implement a publish wrapper
that never blocks, and returns a task that either completes when the publisher
confirm is received, or faults after some provided timeout, is a lot trickier
than it might sound.

Recovery from network interruptions is complicated, and auto-recovery features
are limited, and in some use cases actually dangerous. For example, if you are
manually acknowledging messages to ensure end-to-end at-least-once delivery,
then you cannot safely use the auto-recovery, since the delivery numbers would
reset when the connection does, and you can accidentally aknowlodge the wrong
message with delivery tag 5. (Acknowledge the new one, when you were trying to
ack the old one).

In my implementation of that included my own recovery, I ended up needing to
pass around the IModel itself with the delivery tags, so I can check if the
channel I am about to acknowledge on is really the same one I received the
message on. (There is no unique identifier of a channel instance, since even
the channel number is likely to get re-used).

~~~
lukebakken
Thanks for taking the time to respond. I created this issue so that this
feedback is not lost - [https://github.com/rabbitmq/rabbitmq-dotnet-
client/issues/84...](https://github.com/rabbitmq/rabbitmq-dotnet-
client/issues/843)

If you have code you can share that you used to address shortcomings in the
client, we could get ideas from it for the next major release. Cheers!

------
tracker1
My biggest issue with RabbitMQ is the only official erlang downloads for
windows binaries are from the official website and slow as all getout in most
of the world.

I really don't get why they don't publish at least the windows binaries with
their github releases.

------
danielstocks
I used RabbitMQ together with python and celery quite extensively and it
scales really well. One thing we had trouble with though was to find a nice
mechanism to scheduled tasks. Eg. “Run this task 12 hours before departure”.
Maybe AMQP is the wrong place to solve that problem.

~~~
modal-soul
I've been using something like this for exponential backoffs, but I think it'd
work for this case as well.

Let's say you've got one exchange and one main queue for processing:
jobs.exchange and jobs.queue respectively.

If you need to schedule something for later, you'd assert a new queue with a
TTL for the target amount of time (scheduled-jobs-<time>.queue). Also set an
expiry of some amount of time, so it'd get cleaned up if nothing had been
scheduled for that particular time in a while. Finally, have its dead-letter-
exchange set to jobs.exchange.

This could lead to a bunch of temporary queues, but the expiration should
clean them up when they haven't been used for a bit.

------
pachico
I tend to see people preferring RabbitMQ over Kafka and viceversa as if they
were products solving the same problems but they are not. They do have in
common the fact that they help decoupling applications but in different ways.
Both are great.

------
VectorLock
Everywhere I've worked in the last 10 years has been a cornucopia of
databases, programming languages, cloud platforms, linux flavors and
everything was different except for one thing: they all used RabbitMQ.

------
t_sawyer
My only experience with RabbitMQ is managing an Openstack environment. In that
environment, it's a huge resource hog and we had to put it on 3 separate bare
metal instances to keep it stable.

------
G4BB3R
I read but I still can't understand. I would like to know a very simple
example of something that can't be solved with a CRUD, and can be solved with
RabbitQM

~~~
Raidion
Really really bursty loads. You have a customer upload a data file and you
have to process it. If you crud it, you have a worker chopping it apart and
making sync API calls. If something fails in the middle, it has to retry, but
what happens if your container/database goes down when you're halfway through?
Now you have to reprocess that file again, etc.

You move this to a queue, and have a worker chop that data file up into
individual records, those records go onto a queue, and you can process them
however you want, no worries about something crashing and not being able to be
retried. If the database goes down, everything just pauses until it can go
again. You can limit the queue throughput to whatever you want to avoid having
to scale your API/Database.

Can you handle stuff via all CRUD sync APIs? Sure, just like you could handle
running a restaurant where you have one person who takes the order and cooks
it and delivers it to a table. However, it's more efficient to have a waiter
(API) take requests and give them to a cook (queue based async worker) to
handle stuff that's not as time sensitive. This saves you a lot of money in
certain situations.

------
fasteo
I've been meaning to give RabbitMQ a try in the last few years, but our good
old beanstalkd is serving us well. It has all the features we need, and it
just works.

------
detaro
Given the recent "boom" of MQTT, anyone use RabbitMQ for MQTT clients? Any
benefits of using it that way over using MQTT-only brokers?

~~~
speedgoose
I do and it's great. It doesn't have some MQTT features sush as persisted
messages or QOS 2 but if you don't need that, it's a fine MQTT broker.

~~~
jialutu
Wait, what are you talking about? RabbitMQ does have persistent messages, you
just need to set the queue as "durable", and the messages persist even during
failures.

~~~
speedgoose
Indeed I'm a bit confused. I remembered about having to find a workaround
because I couldn't use retained messages. It's actually not working only for
subscribers with wildcards : [https://github.com/rabbitmq/rabbitmq-
mqtt/issues/154](https://github.com/rabbitmq/rabbitmq-mqtt/issues/154)

------
zkirill
Would anyone using RabbitMQ as a replacement for GCM/FCM on Android mind
sharing their experiences?

------
agsilvio
Doesn't keep the messages for historical analysis. No deal. Kafka please.

------
perlgeek
At work we built a microservice-like (more like meso services) architecture
which uses RabbitMQ for messaging.

RabbitMQ itself is great, but there are some downsides to this architecture:

* Lots of tooling (for blue/green deployments, load balancing, autoscaling, service meshes etc.) assumes HTTP(s)+JSON or GRPC these days

* Getting people who aren't deep into software engineering to write a service that connects to RabbitMQ has a much higher perceived hurdle than making them write a HTTP service

* Operations is different than with HTTP-based services, and many operators aren't used to it

TL;DR: it's more of a niche product for inter-service communication, which
comes with all of the problems that niche products typically face.

------
Pilithe
Does anyone know of any big name brands using RabbitMQ? And if so, what
specifically for?

~~~
lukebakken
While an official list of customers can't be published, you can get some ideas
from the speakers at the last two RabbitMQ summits -
[https://rabbitmqsummit.com/](https://rabbitmqsummit.com/)

Also, see the following articles:

Laika - [https://www.rabbitmq.com/blog/2019/12/16/laika-gets-
creative...](https://www.rabbitmq.com/blog/2019/12/16/laika-gets-creative-
with-rabbitmq-as-the-animation-companys-it-nervous-system/)

Bloomberg - [https://tanzu.vmware.com/content/rabbitmq/keynote-
growing-a-...](https://tanzu.vmware.com/content/rabbitmq/keynote-growing-a-
farm-of-rabbits-to-scale-financial-applications-will-hoy-david-liu)

Goldman Sachs - [https://tanzu.vmware.com/content/rabbitmq/keynote-scaling-
ra...](https://tanzu.vmware.com/content/rabbitmq/keynote-scaling-rabbitmq-at-
goldman-sachs-jonathan-skrzypek)

Softonic - [https://www.cloudamqp.com/blog/2019-01-18-softonic-
userstory...](https://www.cloudamqp.com/blog/2019-01-18-softonic-userstory-
rabbitmq-eventbased-communication.html)

------
bdcs
nit: s/stabe/stable/

------
carapace
(Light gray thin sans-serif body font means you hate my eyes. Y U hate my
eyes?)

~~~
eplanit
One of many trends that needs to pass (Ooooo Apple did it, so it must be cool.
I'll make mine even lighter gray, so I'm that much cooler). :-)

Reader Mode is the answer. The creator of that deserves a Nobel Prize.

