
Redis at Disqus - tswicegood
http://bretthoerner.com/2011/2/21/redis-at-disqus/
======
antirez
Thank you for this post. This is what we need as a community to improve: use
cases, and useful criticisms when things don't work well, so that we can find
new strategies.

It's cool to see that Redis works well for many things, but it will be even
cooler if diskstore, or any other approach, can made Redis more accessible
even when the performance gain of being in-RAM is not enough for some kind of
applications to justify the costs.

We are also working at cluster and faster .rdb persistence. So there are
interesting things going, but fortunately we will have something new and
stable in a few hours, as 2.2.0 stable is going live in very little time :)

~~~
bretthoerner
Yeah, we can't wait for diskstore. If you imagine the analytics use case for a
moment: super high speed is great, but I'd imagine 99% of the read requests
don't ask for anything older than a month. Older data could easy be pushed out
to disk, saving us a lot of RAM. For now we can still operate pretty easily in
RAM (we have a few machines dedicated to analytics and they're just storing
counters or sets of small values), but it'd be great to know we can grow a lot
more without needing to put more of our shards on their own physical machines.

We already run the latest from the 2.2 branch, I can/should go into how easy
that is in a followup.

~~~
antirez
Yes can be a good use case, but the new set of design decisions have new
limitations, nothing is for free :)

Example: with VM there were a number of problems, like keys must be in memory,
super slow persistence, and so forth, but there was no speed penalty if you
always write against a small working set.

Instead with diskstore data is _on disk_. The RAM is just a cache, and if you
configure store with 'cache-flush-delay 60' you are telling Redis that at max
in 60 seconds every given key that is dirty should be flushed on disk.

If there are many writes it is easy to hit the I/O write speed limit, and the
system starts to be I/O bound.

So diskstore is surely a solution when there is a big data problem where
writes are rare compared to reads. If writes are really a lot, there is to
consider the total I/O.

The ideal solution in your scenario is IMHO to take data about the latest N
hours in an in-memory Redis instance. And move the historical data into a
diskstore-enabled instance. This way you have a full win, as the diskstore
instance will be used _only_ for reads, so will provide the maximum benefits.
While the in-memory instance will have the usual predictable and low latency
characteristic of the usual default Redis configuration.

------
bretthoerner
I'm thinking about doing a second post with some actual code (some parts may
be specific to Python, Django, and Celery) if anyone is interested.

~~~
moeffju
I'm definitely interested.

Also, out of curiosity, what do you use to render the actual charts? I'm
working on an analytics package and can't decide on a charting engine that is
clientside and reasonably performant.

~~~
thedz
We went through several different iterations of the line charts. Initially, I
tried using SVG via Raphael, but it turned out to be too slow. Because it's
possible to have a significant number of data points, manipulating and
changing the SVG markup was causing too much of a it.

Eventually we settled on using Flot, which is a canvas-based charting
solution. We made some changes to the core Flot code with some plugins to do
things like change line color and fill on hover, but overall vanilla Flot
served 90% of our needs.

Because Canvas is essentially a bitmap, number of data points have much less
impact on the drawing layer.

Of note, we still use Raphael for pie charts, because, well, they look better
and aren't affected by mass data point numbers.

~~~
moeffju
Thanks for the info!

flot is what I settled on, too, but performance with a lot of datapoints was
not so great. Maybe it was just too much data. Also, the timeline view was
really annoying, but JS' Date function are to blame for that. But with Flash
out of the question and Raphael even slower, I suppose it's the best option
for now.

~~~
thedz
How many datapoints are you using? One things that I've done (though in the
end, we didn't need it realistically) was use a resolution function to tune
the datapoints to fit the canvas width.

If your canvas is only, say, 400 pixels wide, then any time series datapoints
more then 400 will get lost -- there's simply not enough pixels to display
them accurately. As such, you can use a resolution function to reduce that
down to 400.

~~~
moeffju
I have way more; I downsample to temporary mongodb collections and display the
closest resolution for the available width.

I also found it hard to decide on a resolution function for analytics. Do we
show the maximum of a time range? The average? Median? Min and max?

------
r00k
HN folks: would any of you be interested in a 'Getting Started With Redis'
screencast?

Edit: if it were non-free :)

~~~
Hates_
Yes, I'd pay. Especially if it was Ruby/Rails based too.

------
geoffc
The most interesting thing about Redis is that it removes the impedance
mismatch between in code data structures and the data store. It is doing for
data stores what server side javascript does for AJAX applications. OO
persistence was the first step in this direction but Redis nails the real
world use cases a lot better.

~~~
bretthoerner
Absolutely. We don't (currently) use it but amix's redis_wrap is an awesome
example if you use Python. It basically makes interacting with Redis look just
like you're dealing with normal builtin data structures. I'm sure the
equivalent exists or is just as easy to write in any language:
<https://github.com/amix/redis_wrap>

------
xal
We have been using it for sessions (amongst tons of other stuff) at Shopify
for half a year and found that we didn't have problems with increasing memory
after we started setting expiration bits on the session keys.

~~~
bretthoerner
When do you expire your sessions?

If I recall correctly, Django defaults to a two week expiry after the last
session change. Remember that Disqus is a rather large network. We have many
millions of "active" sessions by the definition above. The load of
requests/second and number of sessions in VM really brought out the issues we
ran into. I imagine it'd work just fine for sessions for most sites - but I
still lean toward projects that specifically aim to be disk-backed k/v stores
(membase, etc).

------
nikz
When aggregating stats in this manner (by Day) how do people deal with Time
Zones?

For instance, if I have one user in, say, NZST, their "Tuesday, 22 February"
is still "Monday, 21 February" in PST - and the real issue is that the buckets
are off. So you can't just store in UTC and then move it by whatever timezone
offset, as then you are grabbing different "buckets".

I don't think that explanation is very clear (I had to draw a diagram to
figure it out myself). Hopefully someone smarter than I am can figure it out
anyway.

We've worked around it by just storing hour aggregates, but I'm interested in
case someone else has a smart solution :)

~~~
iampims
Flickr decided that UTC would be the default for their stats. As long as you
stick to it, it's not that big of a problem.

~~~
bretthoerner
Not if you're trying to save RAM (and operations to fetch said data) by
storing stats per day. You'd need all stats to be per-hour in order for it
work for any timezone.

~~~
jrockway
If you store by day UTC, you'd need two fetches to get a day in some other
time zone. But if you store by hour, you need 24 fetches.

~~~
foobarbazetc
They can't in their (Disqus) case because they're aggregating all the stats
per day into a single value. I guess they could do it 'per account TZ' since
the account name is in the key, but that means TZ calc on each write (not that
that will make a huge difference in perf).

For a generic solution with easy TZ calc, you need to aggregate your stats
into hourly values instead (or half hourly if you care about those wacky non-
aligned timezones). The increased fetches don't matter because you can just
mget them.

------
sigil
Sharding: "We just take the modulo of the owning user's ID against the number
of nodes we have to decide which node to read/write from/to."

What's your procedure for adding new nodes to increase capacity? Would you
have to take your redis cluster offline to redistribute data from all nodes
over the new keyspace?

I like the simplicity of your approach, but wonder if consistent hashing might
be a bigger win in the long run.

~~~
bretthoerner
> What's your procedure for adding new nodes to increase capacity? Would you
> have to take your redis cluster offline to redistribute data from all nodes
> over the new keyspace?

It's not easy, actually. The short answer is that we don't add capacity
(because we've only needed to once, and we have tons of room to grow now). The
long answer is that I have a switch I can flip that starts incrementing/adding
data to a whole new cluster of Redis nodes while it still updates the old
ones. We can then backfill all data to the new nodes and when they're setup,
flip a switch to read/write only from/to the new nodes. It may sound a bit
weird, mostly because it is. Moving sets of random keys from one node to the
other _while_ you're expecting live reads/writes is a huge pain, so I just
punted on the problem.

(I elaborated a little more in a comment on my post:
[http://bretthoerner.com/2011/2/21/redis-at-
disqus/#comment-1...](http://bretthoerner.com/2011/2/21/redis-at-
disqus/#comment-153603719))

> wonder if consistent hashing might be a bigger win in the long run.

I'm not sure that it's applicable. Consistent caching is really handy for
caches when you don't want everything to miss as soon as 1/N servers drop out
of the ring. You have to (imo) think of each Redis shard as a "real" DB. If
your master PostgreSQL instance dies, you don't just start reading from
another random instance and returning "None" for all of your queries. If a
shard goes down, you either depend on an up-to-date read slave or nothing at
all. I'm not sure how consistent hashing helps when adding nodes to a "real"
DB, either. Say Node1 holds all of the data for CNN, you add a new node to the
ring and now some % of CNN keys go to that new node. Now all of your writes
are updating new/empty keys and all your reads and reading those new/empty
keys. How does consistent hashing help with the migration?

(I'm really asking, because if I'm missing something I'd _love_ to know.)

~~~
sigil
Thanks for the info!

> I'm not sure how consistent hashing helps when adding nodes to a "real" DB,
> either ... How does consistent hashing help with the migration?

Instead of backfilling all data to an entirely new cluster, you'd only
backfill the small amount of data from the keyspace "stolen" by the new node,
and expire the keys at the original locations. If you use M replicas of each
node around the ring (typically M << N) you only involve M+1 nodes in the
migration process.

I'm still experimenting with this idea myself, and would also love to know if
anyone's tried something similar with data store sharding (not just with
caching).

~~~
bretthoerner
> you'd only backfill the small amount of data from the keyspace "stolen" by
> the new node

I think this is the part I'm not so sure about.

Say I have 100 stats, and of course each stat is per forum, per day (going
back from 1 day to ... 5 years?). How do I know what keys were just "stolen"?
Do I have my new-node code hash every possible key (all stats for all forums
for all hours for all time) to see which might go to that node? And then it
reverses that key to know what it "means" to backfill it? (I need to do that
followup post as the way our data 'flows' in is applicable here)

~~~
sigil
> How do I know what keys were just "stolen"? Do I have my new-node code hash
> every possible key (all stats for all forums for all hours for all time) to
> see which might go to that node? And then it reverses that key to know what
> it "means" to backfill it?

Right, you'd have to iterate through all zset elements on the existing node,
applying the consistent hash function to decide whether or not the element
will be stolen by the new node.

If the element itself doesn't contain user id (or whatever you shard on) all
bets are off.

------
kore
> While the VM backend helped, we found that it still wouldn't stay within the
> bounds we set, and would continually grow no matter what we set. We did
> report the issue but never came to a good solution in time. For example, we
> could give Redis an entire 12GB server and set the VM to 4GB, and given
> enough time (under high load, mind you) it would climb well above 12GB and
> start to swap, more or less killing our site.

We came across this same issue while implementing a Redis-based solution to
improve the scalability of our own systems. Someone filed an issue reporting
this: <http://code.google.com/p/redis/issues/detail?id=248>.

Basically, antirez confirms that Redis does a poor job estimating the amount
of memory used, so you'll need to adjust your redis.conf VM settings to take
this into account. For anybody relying on Redis's VM, I'd recommend writing a
script to load your server with realistic data structures with sizes you
expect in production. You can then profile Redis's configured memory usage vs.
the actual memory usage point at which swapping starts occurring, and set your
redis.conf according to the limitations of your box. For example, we run Redis
2.0.2, and using list structures with ~50 items of moderate size, we found
configuring Redis to use 400MB actually resulted in it using up to 1.4GB
before swapping. We configure our settings to take this into account. Mind
you, this may all change with diskstore, and later versions of Redis which are
supposed to be more memory efficient.

For those curious, our Redis-based solution is helping us scale some write-
heavy activities quite nicely, and has been running stably.

------
DEinspanjer
So frustrating to see all these people doing cool things with Redis and not
having people free to do that stuff here. :) Any Redis hackers looking for a
job or even some contract time? Big advantage is all the work will be open
source and able to be shared and blogged.

~~~
bretthoerner
I'm the author. What about a junior developer with an interest in Redis? This
stuff isn't hard; I have recommendations. E-mail in profile. :)

------
dougk7
I started experimenting with Redis for the last couple of weeks and I'm really
loving it's power. I most rely on posts like these to find new ways to use it.

And your way of sharding definitely gave me more insight into distributing
Redis across many nodes

------
dff
dfgdfg

