It's not the scale, it's the reliability. When I worked at Amazon.com we had a d...

timr · on Oct 4, 2009

"By the time your site is that big, hopefully you've split your software into independent services, each with its own (relatively) small database. And for the services that do need a distributed cache or persistence layer, there's now a huge body of literature and software that solves those problems."

Yes, yes, a thousand times, yes. We seem to be slowly moving toward this ideal at JTV as we grow, because scaling monolithic database architectures is a pain, and it isn't reliable. Even if you scale your database via replication or sharding, you've still got a global choke point for reliability and performance. If your database(s) get slow, if replications lag, if a server dies or whatever, there's a good chance it will take down your whole site. On the other hand, if you use services, you can lose individual features without taking down everything.

I would go so far as to say that it's probably a good idea to build a service-oriented architecture from the start. It's only a bit harder to do, it keeps your code base cleaner and more orthogonal over time, and it makes scaling much easier, should the need ever arise.

bayareaguy · on Oct 4, 2009

Amazon's CTO certainly is a big proponent of SOA.

Here is a light interview (by none other than Jim Gray) on this published in ACM Queue from 2006.

http://portal.acm.org/ft_gateway.cfm?id=1142065&type=pdf

ecq · on Oct 4, 2009

As i mentioned earlier, you can have a highly-scalable "shared-nothing" architecture using existing RDBMS's. It depends on your design and implementation.

Also, when used in active-active configuration, replication actually decreases the reliability of your application by the number of replicated nodes, because you're increasing the number of point of failures in your application (although scalability increases). Replication increases reliability only if used in an active-passive configuration. I've ran into this issue over and over again.

mbrubeck · on Oct 4, 2009

That's exactly why businesses like Amazon that are concerned with both scale and availability are using non-relational databases that replace ACID with "eventual consistency" or other weakened/modified guarantees. Then you can have active-active replication while increasing (rather than decreasing) reliability in the presence of node failures and network partitions.

ecq · on Oct 4, 2009

That's only applicable to applications that does not require ACID. I've seen architectures where sharding + active-passive replication was used to provide scalability without sacrificing reliability (ACID-compliance).

here's an interesting paper on sharding with Oracle vs. Mysql

http://www.oracle.com/technology/deploy/availability/pdf/ora...

mbrubeck · on Oct 4, 2009

The CAP theorem is a very strong limit on providing both availability and consistency in any distributed system. In your sharding+replication example, what happens when the datacenters containing your master and slave lose their network link? There's no way you can maintain write availability for clients in both datacenters while also providing the ACID Consistency guarantee. (But systems like Dynamo or CouchDB can do so while guaranteeing eventual consistency.)

After seeing some real-world, big business problems solved with weakened consistency guarentees, I'm skeptical that there are as many problems that "need" ACID as most people think. Rather, I think that (a) most engineers have not yet been exposed to the reasonable alternatives to ACID, and so have not actually thought about whether they "need" it, and (b) most businesses do not yet have the knowledge they need to weigh ACID's benefits against its availability costs.

ecq · on Oct 4, 2009

I agree with the CAP theorem and it applies to my example. In my example replication only provides a backup copy of the data and is not used by the application, that's why it's active-passive (provides reliability). This configuration provides the highest level of data protection that is possible without affecting the performance of the master database/shard. This is accomplished by allowing a transaction to commit writes to the master database. The same transaction is also written to the slave, but that transaction is optional and written asynchronously. Two-phase commit is not performed.

peoplerock · on Oct 4, 2009

Yes it's interesting. Yes it compares Oracle and MySQL.

For balance it would be helpful to have an analysis from a more independent source than Oracle.

jbellis · on Oct 4, 2009

Sharding isn't scaling. It's not application-transparent, and it's an operational nightmare.

ecq · on Oct 4, 2009

Sharding isn't scaling

Sure sharding is scaling, although there are several types of sharding, with varying degrees of scalability. Usually people use a combination of both database (which what I was talking about) and table sharding (w/ replication,clustering,etc.) to achieve scalability. I've encountered several highly-scalable db environments like these.

It's not application-transparent

There are several databases that have features that can make sharding transparent to your application.

and it's an operational nightmare

it depends on the RDBMS that you're using.