

Ask HN: What is your database sharding strategy? - furiouslol

Need some feedback from experienced DBA on how you would go about sharding a mega huge database. Assume you would want a app-agnostic strategy that you can deploy with any application.<p>Range/Modulo/Hash-based partitioning?<p>Table-level sharding or DB-level sharding?<p>What sort of architecture? Do you have a centralized lookup directory at db-level/app-level? I don't like this idea as I don't want to have a single point of failure.<p>What about a DHT architecture for the DB servers?<p>How do you deal with the loss of joins?
======
eries
We're just discussing over in this thread:
<http://news.ycombinator.com/item?id=296656>

I have had really good experience with a very simple centralized lookup
facility. It's very easy to prevent that from becoming a CPOF, either using
replication or, if you insist, some form of hash-based partitioning of the
directory itself.

I'm a fan of doing the partitioning at the app-level, and doing it
horizontally rather than vertically.

The loss of joins is an issue, but in some cases you can vertically partition
the parts of the app that need those joins. For the others, I have had good
luck using a transactional message-passing system, where one shard can
reliably send a message to another shard.

~~~
furiouslol
Yes. I'm reading your slides now. Very interesting set of technologies you
have there. How do you integrate the usage of multiple languages? Do you use
Thrift?

Instead of a centralized lookup facility, have your team tried a decentralized
approach instead where each DB server (node) contains a portion of the lookup
directory?

~~~
eries
We mostly used simpler RPC mechanisms like XMLRPC or REST. But cross-language
calls (other than client-server) were really the exception. Probably the only
really performance sensitive ones were to our SOLR-based search.

Yes, you can easily move chunks of the address space around, but we decided
that the pain that results from remapping (like when you add a new machine) is
worse. We very rarely had a problem with our master (which we ran in master-
master replication, how's that for term overloading). The few exceptions were
times that our own bad code ran amok driving load. It was way better to invest
in an automatic query killer.

~~~
furiouslol
I'm very interested in knowing how you guys grew so big so fast under the
radar without much evangelizing from the SV scene. Hope to read about this in
your next blog post!

~~~
eries
Just for you: [http://startuplessonslearned.blogspot.com/2008/09/sem-on-
fiv...](http://startuplessonslearned.blogspot.com/2008/09/sem-on-five-dollars-
day.html)

Would love your feedback, if this is helpful.

------
PJM
There's some good answers and examples to the questions about database
sharding here:

<http://www.codefutures.com/database-sharding/>

