

Redis Sharding at Craigslist - jzawodn
http://blog.zawodny.com/2011/02/26/redis-sharding-at-craigslist/

======
Smerity
I love Redis a great deal but I can't wait for the persistence issues to be
worked out.

The fact that they're restricting the working memory set to 24GB on a 32GB
machine purely to prevent death by SAVE or BGSAVE is quite concerning. For my
personal use cases the speed of the SAVE can become drastically long as well -
this becomes even worse if you need to factor in a hosting solution with
temperamental disk I/O such as Amazon EC2.

Redis Diskstore is something I'm eagerly waiting for but for now the issues
with the Redis persistence layer meant I can't use Redis in my recent project.
Here's to a speedy release of Redis 2.4! =]

~~~
davidhollander
> _I can't wait for the persistence issues to be worked out._

Why use an in memory database for persistence instead of for cacheing?
Considering that RAM is the scarcest and thus most valuable resource on nearly
every server, in memory DBs like Redis are better used for storing already
rendered data ready for output rather than every bit of raw data that always
needs to be duplicated or logged on disk at the end of the day anyway.
Persistence==disk!=memory.

> _Redis Diskstore is something I'm eagerly waiting for_

If your data can be represented using hashmaps (unordered) and b+trees
(ordered), check out TokyoCabinet for on-disk persistance. It has the fastest
on-disk hashmaps and b+trees (of non-fixed size) and has been around for a
while.

~~~
Smerity
The point is Redis is beginning to move away from being just an in-memory
database. See my other comment for a more detailed answer as to the use case.

Persistent caches are an important commodity for many websites. For some
services trying to handle standard traffic patterns with an empty cache is
suicide and can cause cascading failures across the board. Without a strong
persistence solution Redis can't be used here. antirez mentioned he considered
it a strange move for Reddit to use to Cassandra and not Redis as a persistent
cache[1] and I think the issues with Redis persistence may well have caused
that.

Additionally I still think there's reasonable ground for a database that's
primarily in-memory but drops least used data off to disk. The vast majority
of the web follows a Zipfian/long tail distribution so although your "working
dataset" can be far larger than RAM your actual "active working dataset" can
fit in there. Why trade away the advantages of an in-memory data structure
driven database when almost all your queries can be satisfied in this manner?

[1]
[http://www.reddit.com/r/programming/comments/bcqhi/reddits_n...](http://www.reddit.com/r/programming/comments/bcqhi/reddits_now_running_on_cassandra/c0m548l)

~~~
davidhollander
> _for a database that's primarily in-memory but drops least used data off to
> disk_

> _so although your "working dataset" can be far larger than RAM your actual
> "active working dataset" can fit in there_

To me that's the exact same pattern as cacheing, merely stated in reverse. If
you want full persistence, everything's going to have to hit the disk anyway
no matter how you phrase it and how and when you store it. In regards to your
user logs problem stated below, that sounds easy to parallelize. I would
hash+modulo the username to a data server number, on the selected data server
traverse a B+ tree of dates\entries in order, and buffer or stream until 20
entries matching the usernames have been retrieved in response. High
performance and fully persistent.

Waiting for a new memory+disk database pattern that does everything more
intelligently and faster than every other disk+memory pattern seems like
procrastination of parallelization, which is the unavoidable long term answer
to these problems.

------
cagenut
For extra context, here's a podcast where Jeremy describes the rest of the
craigslist cluster architecture:

<http://perseus.franklins.net/hanselminutes_0199.mp3>

Thank you for taking the effort to share this stuff, I've learned so much from
you.

------
apu
Great post! I had a few questions:

1\. Do the slaves write to disk? I assume that would incur no performance
penalty, as I believe slave writes don't slow down the master's performance --
only the rate of replication.

2\. Async operations seem like a great idea, and in fact I was toying around
with implementing something like that myself. How far along is your
implementation, and how does it seem to behave so far?

3\. About your proposed 'automatic slave read' idea: The way I understand what
you wrote, it seems like you would end up incurring the same cost on the
master and the slave, i.e., you make the request on the server, and if it
takes longer than X msecs, you then issue the request on the slave. Instead,
is there any merit to having a redis command that returns how much time is
likely going to be required for a given command? It wouldn't have to be very
precise -- I imagine even the rough order of magnitude would be helpful for
deciding whether to get the data from a given master or its slave?

~~~
jzawodn
#1: currently no, but I'm about to change that and have them do a SAVE every
few hours or so.

#2: my implementation is pretty far along in my testing but I do need to
polish it up, update the docs, and test some edge cases

#3: the idea with the slave reads for stuff that is CPU expensive or could
slow down other requests (doing large multi-set operations, for example)

------
slay2k
No disrespect towards you, Jeremy, you're pretty awesome in my book
(literally.. my High Perf MySQL book).

But with Craigslist being the most closed-minded and developer-hating
organization I've ever come across, I don't particularly give a rat's ass what
it's built on.

I wish it wasn't so, because I generally love posts like this, but if a
dictator's employees start giving tours of the mansion, you certainly won't
find me dazzled by the motion-sensor water fountains..

~~~
jzawodn
Uh. What happened to evoke this response?

You can email jzawodn@craigslist if you don't want to say so in public, but
we're not anti-developer. We are anti-abuse (which we see A LOT of) and anti-
TOS violations.

So I'm curious to understand your position.

~~~
slay2k
Thanks Jeremy. I'll choose to post here as I didn't intend for my original
comment to come off hostile, so I'll try and clarify what I meant.

The comment stemmed from frustration at the fact that, for a very long time,
CL has been killing all creative efforts to improve or build on it. Mature
sites used by millions shut down after years of operation, despite loud user
protests[1][2], and those killed in utero[3] have been the norm. I realize
that it's easy to point to the TOS and call it a day, but the TOS is neither
consistent[4] nor clear.

Friends of mine are running startups that get some of their data from CL, and
try to stay under the radar because of fears and uncertainties. They aren't
doing anything remotely shady. To them, CL's decision to kill a site comes on
a whim, and nobody really knows what's considered okay WRT the TOS and what
isn't. I mean, if Craig himself can't provide a definitive answer[5], then
surely you can see how this could become frustrating for developers.

I'm not used to any kind of openness coming from CL on the dev front, and your
post has been the first one I've seen in that category, so I want to apologize
if you caught the brunt of my frustration.

[1] <http://blog.claz.org/#post-94>

[2] [http://downloadsquad.switched.com/2007/06/08/jim-
buckmeister...](http://downloadsquad.switched.com/2007/06/08/jim-buckmeister-
qanda-why-craigslist-banned-listpic/)

[3] [http://techcrunch.com/2009/12/01/craigslist-yahoo-pipes-
flip...](http://techcrunch.com/2009/12/01/craigslist-yahoo-pipes-flippity/)

[4] <http://www.housingmaps.com/> \-- one of several that has been up since
2005

[5] <http://romy.posterous.com/dont-be-evil-craigslist>

