

An Unorthodox Approach to Database Design : The Coming of the Shard - nickb
http://highscalability.com/unorthodox-approach-database-design-coming-shard

======
simpleenigma
I've been working with a programming language called Erlang for about two
years now, and their RAM based database called Mnesia almost forces you into
this 'sharding' style.

In Erlang you also get the ability to have the same database on multiple
computers or fragment the database into different servers that hold only a
certain piece.

It certainly was a trick to unlearn normalization techniques I had been using
for years, but once you get he hang of it the data really becomes or
efficient.

------
menloparkbum
The sharding approach seems to be gaining popularity. It seems like a good
datapoint in favor of the anti-database crowd. Why even bother using an RDBMS
when you end up having to take this approach in the end, anyway?

------
sanj
There's something familiar about this approach. It reminds me of JMP vs. LJMP
commands in processor instructions sets that had partitioned memory sets. Or
had a JMP command that was distance limited (often to +/- 32k).

Effectively, sharding takes advantage of the locality of information. As soon
as you need information outside of that locale, you need to make a more
expensive request.

~~~
nostrademons
It's a bit more than that - sharding is fundamentally a way of spreading
writes. You're limited in the number of writes you can make by the speed at
which the disk rotates, so this quickly becomes a bottleneck, particularly in
heavily-trafficked social sites. Sharding lets you spread your writes out
among an arbitrary number of machines, so you can grow with traffic.

Big sites that are primarily read-hungry often don't need to shard at all,
instead relying upon replication. Look at Craigslist - they have a single
master database, about 20 slaves, and an archive cluster for postings that
nobody looks at anymore. Or PlentyOfFish - 1 machine for billions of
pageviews. Social sites with lots of user-created content need to shard much
earlier - LiveJournal has a fraction of the users as Craigslist, but they had
to start sharding around 2002/2003.

There're locality concerns as well, but they tend to be secondary to the write
issue. Disks are big, and with good caching you can often avoid hitting the
disk at all.

~~~
jamongkad
"Or PlentyOfFish - 1 machine for billions of pageviews. Social sites with lots
of user-created content need to shard much earlier"

Is that really true? I find it hard to imagine...but something deep down
inside of me says "Yes it's possible". Oh my...

~~~
staunch
Saying it's accurate is the kindest way to put it:
<http://news.ycombinator.com/item?id=23497>

------
patrickg-zill
I think this is a re-invention of relatively well-known ways of optimizing
databases. still interesting to read about however.

~~~
nostrademons
It used to be called horizontal partitioning before Google started using the
sharding term with Google Filesystem. It's been around for a while. The new
part seems to be notion of pushing it down into the application code instead
of paying Oracle or IBM a few million dollars for a DBMS that handles it
automatically.

------
horatio05
How would one know what data to place in shards? Are there any "RDBMS to
Shard" resources out there?

