
Sharding Techniques at Mixpanel Engineering - suhail
http://code.mixpanel.com/sharding-techniques/
======
fleitz
I'm surprised no one has considered using CIDR notation for sharding. If you
SHA-256 the key you've got 256-bits of address space. You could store the
whole table very compactly and there are plenty of algorithms out there for
making the lookup VERY fast.

You could use DNS to transmit the server info eg. do a lookup on
a4.c7.3a.d4.ec.1f.5d.f4.c6.f6.5f.77.f3.71.f5.3d.fa.e8.ea.15.cluster.foo.com

In your DNS server just have a record for *.15.cluster.foo.com this
effectively gives you a /8 on a 256-bit routing space. To find all the servers
you need to query to query the entire keyspace just ask for an AXFR request on
cluster.foo.com. This system also makes mirroring and partitioning extremely
simple. Say you have disparate servers that can handle disparate amounts of
data? Just turn your smaller servers into /9s

Note if you're considering implementing this there are some patent
applications around it. Look in my profile if you want to code around the
patent application.

------
andrew311
This post is a great 101, but it doesn't provide much info on what kind of
setup you run at Mixpanel. I see you ditched Cassandra, though, and you
mention Riak. Are you using Riak, rolling your own sharding layer, or what?

------
akronim
Step 0 - have enough traction that you need to worry about sharding.

~~~
mahmud
Not only does Mixpanel have the traffic to justify this, they're also (were?)
a 2-guy shop doing things on shoe-string and sheer engineering.

Misplaced snark is misplaced.

