

Ask HN: "Simple" sharding tutorials using MySQL or PostgreSQL? - sanj

Hiya.<p>I'd appreciate pointers to tutorials on "simple" sharding using MySQL or PostgreSQL.<p>What I mean by "simple" is:<p>- Tables are designed so each one has a field that can be used to identify the correct partition to use (in my case, user_id)<p>- Data/Queries are organized so that the vast majority of manipulation happens on a single partition<p>- Rebalancing shards is an offline procedure<p>I'd love to push the details of disks/servers/processes that handle the queries down into the DB layer and not worry about it at the app (in my case, Rails) layer: give me a single entry point that can shared across all of my app servers.  We can play DNS games if need be to distribute that load.<p>Bonus points if a tutorial talks about how to set up replication in this context too.<p>It seems like <i>many</i> people would be interested in this.
======
alexstaubo
Sharding is icky and hard because it strokes relational databases against the
grain. It also, incidentally, goes against the conventions of most web
frameworks, including Rails. Since there's no database that does it for you,
you really have to design for it from the beginning. Clay Shirky's scalability
book, with anecdotes from Flickr's early development, is a must-read.

The Skype people do transparent sharding with PostgreSQL (the
highscalability.com guy blogged about it here:
[http://highscalability.com/skype-plans-postgresql-
scale-1-bi...](http://highscalability.com/skype-plans-postgresql-
scale-1-billion-users)). They accomplish this using their own PL/Proxy plugin
(<http://developer.skype.com/SkypeGarage/DbProjects/PlProxy>). The only way
they can do this is by _never_ doing SQL in the client. Instead, all SQL is
wrapped in server-side functions, aka stored procedures (written in
PL/Python). The magic in PL/Proxy is to enable the execution of these
functions based on a hash. So for example, a function get_user_email(username)
is implemented as:

create function get_user_email(username text) returns text as $$ cluster
'userdb'; run on hashtext(username); $$ language plproxy;

...which results in the real function being executed on some other server.

There are pros and cons to this approach. Could they not have implemented this
in the application? Yeah. The additional latency of doing one extra remote
call (even if it can be pipelined) can't be good.

Another way of automating sharding is to use a middle layer like PgPool,
GridSQL or Continuent uni/cluster. The first two are open-source projects, the
latter is commercial; PgPool is written in C, the last two are Java (although
GridSQL has C bindings).

PgPool is a proxy that can also do replication, partitioning, load-balancing
and parallelization -- and you can pretty much pick which ones you want.
PgPool intercepts all SQL statements, inspects them, possibly rewrites them,
and sends them to one of several backend PostgreSQL servers. For example, it
can route all updates to multiple PostgreSQL servers, ensuring that they're
identical (until one of them goes offline, at which point you have to pull it
manually back into the pool). It can also partition data, by checking inserts
and routing them to the right box, and you can write the partitioning
functions in PL/pgSQL. And it can also route queries semi-intelligently, so
that a query that is known to only touch a single partition will only go to
that partition. And it can parallelize them, so that when you do "select *
from foo", it'll run the query on all servers and then combine the results.

GridSQL covers the same ground as PgPool, but seems a little more advanced. It
has a complete SQL parser that is supposed to be able to do query routing and
rewriting more efficiently. From what I can see, they are emphasizing
performance and parallel queries above anything else.

I haven't tried uni/cluster, but it's similar in scope and features.

The main problem with these products is that they themselves become a
bottleneck. I don't think you can scale them horizontally by just piling on
more proxies -- the proxies themselves are gatekeepers. So you're just trading
one bottleneck for another.

After implementing a couple of large, popular, bottom-heavy, hard-to-scale
Rails apps, I am now of the opinion that well-designed apps should never talk
directly to a database; by doing this, you are making the database layer a
bottleneck, which is particularly bad with relational databases, which cannot
scale very far horizontally.

Instead, you should have the application serve an internal API that can be
broken up and multiplied and moved. For example, consider the get_user_email
function from the exampe above. Consider the Rails way:

User.find(params[:id]).email

You are already tied to a very specific code path -- find the object, read the
email. To shard this thing, you have to override the find method, as well as
any update methods, and you will end up a graceless patch on top of
ActiveRecord. What if we made the use case -- getting the email address of a
user -- explicit, as a real API?

User.get_email(params[:id])

Similarly, you would "sculpt" dedicated methods to handle more complicated
queries. Consider:

User.get_latest_comments(params[:id] :in_forum => params[:forum_id])

Now we have isolated ourselves from the database, and we have an API that can
actually be sharded. In fact, what we have is an _abstraction_. Abstractions
are useful. Sure, the implementation can use ActiveRecord or whatever, but
it's no longer dependent on anything except the input. It's not dependent on
database model details like which table holds the email address -- the
abstraction separates the _concern_ of the application from the data API.

If we have a single app database and it doesn't scale, we can move all the
users over to a different server with its own database and make the API be
remote, based on a REST or dRb interface. If this user server becomes too
slow, we can just add another user server -- "just add another box" is the
core tenet of any scale-out stategy.

So if I was building a new app today, that's what I would do.

~~~
toffer
One minor correction to your very helpful post: the scalability book you are
referring to is "Building Scalable Web Sites" by Cal Henderson (not Clay
Shirky).

<http://oreilly.com/catalog/9780596102357/>

~~~
alexstaubo
Of course. Thanks.

------
dzohrob
Most of the methods people use for sharding/partitioning end up in the
application layer, because open-source DBs just aren't fancy enough yet to
handle this.

Mysql-proxy (<http://forge.mysql.com/wiki/MySQL_Proxy>) looks promising, but
it's still in an alpha state. You could try writing a proxy script in Lua --
there's already some master /slave stuff done in this area.

Mysql 5.1 supports partitioning, but not really in the way that you and I
mean.

The problem is that sharding breaks some fundamental SQL axioms -- for
example, doing SELECTs no longer returns all rows (depending on how the table
is partitioned).

Almost everyone I know who does this rolls their own layer. As subwindow
mentioned, if you're using Rails, you have to futz with establish_connection
due to ActiveRecord's assumptions about one DB connection per class. It's
possible -- we've done it -- but it requires lots of connections to the DB and
it can be messy. If you're on Rails, it might be easier to scale vertically
first before going horizontal.

Replication doesn't need to change in this context -- the Mysql documentation
should be all you need to get started.

------
subwindow
When I've planned out sharded infrastructures, the database usually wasn't
that big of a concern. The web framework or system architecture are usually
the pain point. And with Rails, you have to abuse establish_connection if
you're going to have each web head read from multiple shards.

The easiest route I've gone when setting up a sharded infrastructure is to use
subdomains and a 1-1 Web:DB setup. Have each subdomain go (either thru a
reverse proxy or hardware load balancer) to a different (sharded) webhead.
Each web head talks to two databases- the common database and its sharded DB.
With this you'll probably want a "common" web head to handle home page traffic
and authentication (after they are authenticated they'd get dished off to
their shard).

Salesforce.com was my inspiration for this method, and it is probably
reasonably common. It probably also has a name, but I do not know what it is.

~~~
sanj
I've seen folks approach things this way -- Dr. Nic's done some of it in an
exploratory manner: <http://tinyurl.com/36twmo>

But I don't think it is the best approach. I'd much rather push ALL of the
magic down into the database layer and not have the app worry about it at all.
I want ONE layer of magic and I'd prefer it to be in the DB, where it seems it
"should" be.

~~~
cstejerean
I'd rather have the layer of magic be in the part of the code I understand
best so I can fix it when things go wrong.

~~~
sanj
I'm pretty confident I can learn what I need about DB configs, so I'm more
concerned about it being in the "right" place from a complexity and efficiency
standpoint.

------
aston
The 2nd edition of High Performance MySQL might interest you:
<http://www.amazon.com/dp/0596101716>

It covers, among tons of other stuff, some new features in MySQL 5.1 that make
partitioning easier.

------
rit
I don't know about Rails (It's DB library is ActiveRecord, if I recall names
correctly) but SQLAlchemy which is Python's big ORM, has support in it for
sharding and similar partitioning, etc.

When we built sketchcast.com sharding was on the design list and a concern -
there were some fantastic articles at <http://highscalability.com/> that we
used as a basis for some of our design.

------
lpgauth
Not exactly sure if this can help you but I guess you could implement the
logic using dbslayer.

[http://github.com/harrisj/activerecord-dbslayer-
adapter/tree...](http://github.com/harrisj/activerecord-dbslayer-
adapter/tree/master)

------
sanj
It seems like I'm not the only one searching.

Here's something from my own archives:

[http://www.postgresql.org/docs/8.1/interactive/ddl-
partition...](http://www.postgresql.org/docs/8.1/interactive/ddl-
partitioning.html)

~~~
alexstaubo
PostgreSQL table partitioning has nothing to do with shards. They don't let
you split a database across multiple machines. You _can_ use them to split a
single table across multiple disks (using tablespaces), but that's not the
same thing.

------
gaius
You probably want this 1986 paper:
<http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf>

~~~
sanj
No, I understand how/why I want to do sharding and its implications. I'm
looking for something much more tactical and practical.

------
prakash
google search sharding site:highscalability.com

<http://tinyurl.com/5znhvu>

~~~
sanj
highscalability.com is a great site that provides conceptual level information
about sharding.

But it doesn't provide anything approaching a cookbook or tutorial.

