That's exactly why businesses like Amazon that are concerned with both scale and...

ecq · on Oct 4, 2009

That's only applicable to applications that does not require ACID. I've seen architectures where sharding + active-passive replication was used to provide scalability without sacrificing reliability (ACID-compliance).

here's an interesting paper on sharding with Oracle vs. Mysql

http://www.oracle.com/technology/deploy/availability/pdf/ora...

mbrubeck · on Oct 4, 2009

The CAP theorem is a very strong limit on providing both availability and consistency in any distributed system. In your sharding+replication example, what happens when the datacenters containing your master and slave lose their network link? There's no way you can maintain write availability for clients in both datacenters while also providing the ACID Consistency guarantee. (But systems like Dynamo or CouchDB can do so while guaranteeing eventual consistency.)

After seeing some real-world, big business problems solved with weakened consistency guarentees, I'm skeptical that there are as many problems that "need" ACID as most people think. Rather, I think that (a) most engineers have not yet been exposed to the reasonable alternatives to ACID, and so have not actually thought about whether they "need" it, and (b) most businesses do not yet have the knowledge they need to weigh ACID's benefits against its availability costs.

ecq · on Oct 4, 2009

I agree with the CAP theorem and it applies to my example. In my example replication only provides a backup copy of the data and is not used by the application, that's why it's active-passive (provides reliability). This configuration provides the highest level of data protection that is possible without affecting the performance of the master database/shard. This is accomplished by allowing a transaction to commit writes to the master database. The same transaction is also written to the slave, but that transaction is optional and written asynchronously. Two-phase commit is not performed.

peoplerock · on Oct 4, 2009

Yes it's interesting. Yes it compares Oracle and MySQL.

For balance it would be helpful to have an analysis from a more independent source than Oracle.

jbellis · on Oct 4, 2009

Sharding isn't scaling. It's not application-transparent, and it's an operational nightmare.

ecq · on Oct 4, 2009

Sharding isn't scaling

Sure sharding is scaling, although there are several types of sharding, with varying degrees of scalability. Usually people use a combination of both database (which what I was talking about) and table sharding (w/ replication,clustering,etc.) to achieve scalability. I've encountered several highly-scalable db environments like these.

It's not application-transparent

There are several databases that have features that can make sharding transparent to your application.

and it's an operational nightmare

it depends on the RDBMS that you're using.