
Scaling with Queues - paraschopra
http://engineering.wingify.com/scaling-with-queues/
======
whisk3rs
I'm curious about the justification for Redis in this pipeline. That seems to
introduce additional failure modes. Was it the case that a non-durable non-
confirming (in-memory only) RabbitMQ wouldn't satisfy the latency
requirements? I'd love to see some benchmarks comparing Redis to an in-memory
RabbitMQ.

Pulling data out of Redis and into RabbitMQ seems fraught with problems,
whereas you could have the durable RabbitMQ use shovel to pull data out of the
faster in-memory RabbitMQ with a few lines of config.

~~~
bhaisaab
Hi, I'm the backend engineer at Wingify who wrote this post.

whisk3rs: For our use case, message had to be reliable, we required publisher
confirms and you're sort of right, RabbitMQ in our use case did not satisfy
our latency requirements. I've tried shovel and it does not solve our problem
like we wanted. I don't have component based benchmarks between Redis and
RabbitMQ but I've already shared the loader.io results of our two pipelines.
Some details below.

noelwelsh: We're using Redis as an intermediate storage sink for RabbitMQ and
instead of just passing messages from Redis to RabbitMQ, we move them in
chunks. I'll explain below why we could not move away from Redis. "mbell" is
correct about why do it that way.

NOTE: this is going to be a long reply;

This blog post talks mostly about data acquisition, let me start by giving
some background on our backend services. You may read more about VWO on
visualwebsiteoptimizer.com, so I'm skipping that. We've multiple servers
across globe which helps us do data acquisition (capturing data for analytics)
and servers which serve javascript snippets which are applied on one's
website. Our users install VWO code on their website and depending on the test
etc. the code from our servers is served dynamically and applied (like
"karolisd" commented, we do it as fast as possible and we're still tuning our
systems, and it is dynamic).

We don't use any CDN (such as akamai, cloudfront etc.) for our dynamic content
as they fail at providing us tweaking mechanisms while serving dynamic content
on same url and such a design would break user experience and we don't want
our users to keep changing the installed code on their websites -- for them it
should just work. Many such services require you to install some code that
would brings some js from a url, each time you modify something they may
either ask you to install new code with the new url or if they're using CDN
they might send you all the changes by the same url (for ex. increase of
payload size due to unnecessary data).

So, we have two most important requirements; one -- to do data acquisition
reliably, as fast as possible with minimum payload; two -- to serve content
dynamically with minimum payload and as fast as possible because it is most
important for us to not slow down website of our users.

To do that we use a custom-compiled OpenResty (nginx mod) with luajit and our
(Lua) code runs inside OpenResty offering us minimum latencies and processing
speeds (no reverse proxies). At the time we started solving our scaling
problem, there was no lua-resty library that does publishing from
Lua/OpenResty to RabbitMQ. Writing a production grade lua-resty AMQP library
would require a lot of time (you may search our discussions on openresty-en
mailing list). So I started by writing an opensource stomp based library to
use RabbitMQ's STOMP adapter and STOMP was much light weight and easy to
implement compared to AMQP (there are multiple versions as well). So, in our
Lua/OpenResty code we had two options either to publish messages directly to
RabbitMQ or to Redis. Publishing to RabbitMQ was slow due to latencies, the
AMQP overhead and the small payload size (less than 1kB), it was a deal
breaker for us. But running Redis locally on unix sockets to which our
Lua/OpenResty code would write was much better in terms of latencies, and
transferring data in chunks from Redis to RabbitMQ improved throughputs.

Coming back to the earlier comment on this thread, Redis does not add any
failure case, instead it provides us a failover: So far I've seen that there
may be more cases or chances of the whole datacenter (network) going down
instead of a server, in that case data sits in Redis. Our cron jobs along with
monitoring tools make sure local services on each servers are up and running.
In case our server goes down, our Anycast DNS (with low TTL) would switch
traffic to other available servers automatically. In case RabbitMQ
servers/datacenter goes down, data would be pushed into it next time the
network/server goes down; using reliable messaging ensures messages/chunks are
written to disk (fsync) so they persist if not consumed, publisher confirms
give us reliability. In case the consumers die, data sits in RabbitMQ. In case
of timeouts and latencies of moving data to RabbitMQ, agentredrabbit handles
that.

Comments, questions welcome.

~~~
gaius
Is there a reason not to just use Tibco RV? Because all of these are long-
solved problems.

~~~
bhaisaab
We love opensource, AMQP was created by wall-street giants and RabbitMQ fits
our case and Tibco RV does not solve our particular case in our particular
environment.

~~~
jzelinskie
Did you guys evaluate ZeroMQ? IIRC It's made by the same guys that originally
designed AMQP. If you did, I'm curious to see what your conclusions were.

~~~
bhaisaab
Yes. Among several queuing systems we played with RabbitMQ just worked out of
the box with features we wanted such as reliability, confirms and the queueing
patterns (routing, topologies, fan-out etc.), it was easy to use and deploy.
0MQ gave us no message persistence or broker implementation (having a broker
decouples our producers and consumers), if we were to use 0MQ many features we
wanted would have to written which RabbitMQ provided out of the box. I think
0MQ is more like a framework than a queueing system/platform like RabbitMQ
which just works.

------
karolisd
When I'm working on an A/B test in Visual Website Optimizer, the speed at
which the script is updated when I hit save never ceases to amaze me. I've
never had to hit refresh multiple times and wait for it, unlike with other
services. It's interesting to see how it works behind the scenes.

~~~
bhaisaab
Thanks for your comment. I'm the backend engineer at Wingify who wrote this
post. This post talks mostly about data acquisition, I'll suggest my team to
post more technical details of our dynamic CDN which serves the dynamic
content.

------
qooleot
Very cool, and awesome on opensourcing the agentredrabbit code!

We had a somewhat simpler but related "tons of data coming in via RESTful
services and don't want to flood directly to the db with massive parallel
connections" issue and solved it with Redis likewise. Since we used postgres
as the backend, we ([http://www.ivc.com](http://www.ivc.com)) actually
sponsored a project create a Redis FDW:

[https://github.com/pg-redis-fdw/redis_fdw](https://github.com/pg-redis-
fdw/redis_fdw)

so we could batch up 1,000s of inserts into a single database insert, thus
reducing IO to disk.

~~~
bhaisaab
Great, thanks for sharing.

------
danielovich
CQRS

