

Mysql Sharding For A Site That Gets 5 Billion Views Per Month - paul_houle
http://www.jurriaanpersyn.com/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/

======
aristus
Also remember that the 5Bn pageviews is not the reason to shard -- it's the
total size and mutation rate ("read/write ratio") of your data. If your DB
fits in memory, or if your reads are 10X your writes, don't just shard because
it's cool.

------
mdasen
A wonderful read for anyone looking to learn about sharding. However, the
Final Thoughts section is probably the most important: don't do it unless you
have to. Sharding is a pain and will severely limit your ability to develop
new features quickly and for many sites, sharding will never be needed (since
they won't grow that big). Databases are efficient and hardware is fast.
Sometimes your user-base is larger than what those great things can handle so
it's good to know what sharding is, but it isn't fun to do so avoid it if you
don't need it.

------
amix
If you plan to create a popular product that handles tons of data and tons of
users, then include sharding as soon as possible. Bolting sharding on after
you have lots of data, lots of code, lots of traffic and lots of users is a
nightmare and worst-case scenario.

You can postpone it thought if you are unsure how popular your product will
become. If you do postpone it, then be sure your joins are "sane" - normally,
you don't really do joins in a sharded environment as data is located on
different databases, so you will do yourself a major favor in not doing joins
that will force you to re-model your data when you switch to sharding.

And in a sharded environment you also copy things around :)

~~~
aaronblohowiak
I like to abstract data access from the implementation of its storage, so
making changes to the storage methodology shouldn't/won't require substantial
changes to business or application logic. Keeping the separation clear is more
difficult for some people than others. One of the problems with the route you
are suggesting is that people don't know the future. Eliminate bottlenecks as
they start to come up, and don't tightly couple unrelated concerns.

------
Hates_
40m+ active users and I'd never even heard of them!

~~~
samueladam
Because netlog's audience is pretty different from hacker news.

Search for common names to get an idea:
<http://www.google.com/search?q=netlog+profile+for+julia>
<http://www.google.com/search?q=netlog+profile+for+john>

~~~
wesley
No need, they have a full people directory:

<http://nl.netlog.com/go/directory/> (Holland & Belgium, change subdomain for
diff areas)

