Pulling data out of Redis and into RabbitMQ seems fraught with problems, whereas you could have the durable RabbitMQ use shovel to pull data out of the faster in-memory RabbitMQ with a few lines of config.
"Later, I ... ran a loader.io load test. It failed this model due to very low throughtput, we performed a load test on a small 1G DigitalOcean instance for both models. For us, the STOMP protocol and slow RabbitMQ STOMP adapter were performance bottlenecks. RabbitMQ was not as fast as Redis, so we decided to keep it and work on the second model."
So they're using Redis as a queue for RabbitMQ :-) Just use Kafka, I say:
"Kafka handles over 10 billion message writes per day with a sustained load over the highest traffic hour that averages 172,000 messages per second. ... the footprint for the Kafka pipeline is only 8 servers per data center" http://sites.computer.org/debull/A12june/pipeline.pdf
That oughta be enough.
From the comments at the bottom of the article:
> I think it would depend on one's use case and requirements, but Redis is generally faster than RabbitMQ since it avoids the heavier AMQP protocol overhead and internal (data structure, etc.) implementation (C vs Erlang) and we are using unix sockets to communicate with it (locally) but then it's not as reliable as RabbitMQ.
In our case it is more important to process a HTTP request as fast as possible, while the processing, propagation and storage of request data for analytics could be slow but has to be reliable therefore we chose the Lua->Redis->agentredrabbit->RabbitMQ model.
Personally I wouldn't think it would be an issue, elsewhere he mentions: "we aim for less than 40ms" I can't imagine enqueuing a (seemingly) small message in rabbitMQ takes more than 1 or 2 ms.
whisk3rs: For our use case, message had to be reliable, we required publisher confirms and you're sort of right, RabbitMQ in our use case did not satisfy our latency requirements. I've tried shovel and it does not solve our problem like we wanted. I don't have component based benchmarks between Redis and RabbitMQ but I've already shared the loader.io results of our two pipelines. Some details below.
noelwelsh: We're using Redis as an intermediate storage sink for RabbitMQ and instead of just passing messages from Redis to RabbitMQ, we move them in chunks. I'll explain below why we could not move away from Redis. "mbell" is correct about why do it that way.
NOTE: this is going to be a long reply;
We don't use any CDN (such as akamai, cloudfront etc.) for our dynamic content as they fail at providing us tweaking mechanisms while serving dynamic content on same url and such a design would break user experience and we don't want our users to keep changing the installed code on their websites -- for them it should just work. Many such services require you to install some code that would brings some js from a url, each time you modify something they may either ask you to install new code with the new url or if they're using CDN they might send you all the changes by the same url (for ex. increase of payload size due to unnecessary data).
So, we have two most important requirements; one -- to do data acquisition reliably, as fast as possible with minimum payload; two -- to serve content dynamically with minimum payload and as fast as possible because it is most important for us to not slow down website of our users.
To do that we use a custom-compiled OpenResty (nginx mod) with luajit and our (Lua) code runs inside OpenResty offering us minimum latencies and processing speeds (no reverse proxies). At the time we started solving our scaling problem, there was no lua-resty library that does publishing from Lua/OpenResty to RabbitMQ. Writing a production grade lua-resty AMQP library would require a lot of time (you may search our discussions on openresty-en mailing list). So I started by writing an opensource stomp based library to use RabbitMQ's STOMP adapter and STOMP was much light weight and easy to implement compared to AMQP (there are multiple versions as well). So, in our Lua/OpenResty code we had two options either to publish messages directly to RabbitMQ or to Redis. Publishing to RabbitMQ was slow due to latencies, the AMQP overhead and the small payload size (less than 1kB), it was a deal breaker for us. But running Redis locally on unix sockets to which our Lua/OpenResty code would write was much better in terms of latencies, and transferring data in chunks from Redis to RabbitMQ improved throughputs.
Coming back to the earlier comment on this thread, Redis does not add any failure case, instead it provides us a failover: So far I've seen that there may be more cases or chances of the whole datacenter (network) going down instead of a server, in that case data sits in Redis. Our cron jobs along with monitoring tools make sure local services on each servers are up and running. In case our server goes down, our Anycast DNS (with low TTL) would switch traffic to other available servers automatically. In case RabbitMQ servers/datacenter goes down, data would be pushed into it next time the network/server goes down; using reliable messaging ensures messages/chunks are written to disk (fsync) so they persist if not consumed, publisher confirms give us reliability. In case the consumers die, data sits in RabbitMQ. In case of timeouts and latencies of moving data to RabbitMQ, agentredrabbit handles that.
Comments, questions welcome.
AMQP is indeed a complicated protocol so I can appreciate your reluctance to write a client.
The failure mode I was referring to with Redis is that you now have additional custom code which pulls from Redis and inserts to RabbitMQ. This is another process that can fail, and some of those failure modes could result in duplicate or lost messages. For example, agentredrabbit could shutdown in an unclean way where the "dump" file doesn't get written. Also, you didn't mention if you're using Redis in AOF or RDB persistence mode, but in RDB it only writes to disk occasionally so you could lose messages there. Also, if agentredrabbit fails then Redis could run out of memory (RabbitMQ automatically writes to disk once memory limits are exceeded).
I will believe that writing to Redis is /always/ faster than writing to RabbitMQ, but I'd really like to know what other things you guys tried to make RabbitMQ actually work for you. Did you experiment with queue settings? Memory limits? Erlang tweaks? SSDs? What was it about shovel that didn't work the way you wanted it to? I'm working towards what I hope will be a pure RabbitMQ infrastructure so knowing what you've tried and didn't try will be informative.
We followed a similar path using Redis as a queue, and ended up having to write a 'valve' process that watches the queue and starts popping off messages into a file when the queue length gets too long. At that point we had half-reimplemented a real queue.
My only point is really just that redis-as-a-queue has some bad failure modes if you aren't careful.
Twitter too, shouldn't have wasted so much time reinventing the wheel and very publicly getting it wrong a few times. It probably cost them a lot more in the end too (but hey, it was only VC money, right?)
We had a somewhat simpler but related "tons of data coming in via RESTful services and don't want to flood directly to the db with massive parallel connections" issue and solved it with Redis likewise. Since we used postgres as the backend, we (http://www.ivc.com) actually sponsored a project create a Redis FDW:
so we could batch up 1,000s of inserts into a single database insert, thus reducing IO to disk.