Hacker News new | past | comments | ask | show | jobs | submit login

As i mentioned earlier, you can have a highly-scalable "shared-nothing" architecture using existing RDBMS's. It depends on your design and implementation.

Also, when used in active-active configuration, replication actually decreases the reliability of your application by the number of replicated nodes, because you're increasing the number of point of failures in your application (although scalability increases). Replication increases reliability only if used in an active-passive configuration. I've ran into this issue over and over again.

That's exactly why businesses like Amazon that are concerned with both scale and availability are using non-relational databases that replace ACID with "eventual consistency" or other weakened/modified guarantees. Then you can have active-active replication while increasing (rather than decreasing) reliability in the presence of node failures and network partitions.

That's only applicable to applications that does not require ACID. I've seen architectures where sharding + active-passive replication was used to provide scalability without sacrificing reliability (ACID-compliance).

here's an interesting paper on sharding with Oracle vs. Mysql


The CAP theorem is a very strong limit on providing both availability and consistency in any distributed system. In your sharding+replication example, what happens when the datacenters containing your master and slave lose their network link? There's no way you can maintain write availability for clients in both datacenters while also providing the ACID Consistency guarantee. (But systems like Dynamo or CouchDB can do so while guaranteeing eventual consistency.)

After seeing some real-world, big business problems solved with weakened consistency guarentees, I'm skeptical that there are as many problems that "need" ACID as most people think. Rather, I think that (a) most engineers have not yet been exposed to the reasonable alternatives to ACID, and so have not actually thought about whether they "need" it, and (b) most businesses do not yet have the knowledge they need to weigh ACID's benefits against its availability costs.

I agree with the CAP theorem and it applies to my example. In my example replication only provides a backup copy of the data and is not used by the application, that's why it's active-passive (provides reliability). This configuration provides the highest level of data protection that is possible without affecting the performance of the master database/shard. This is accomplished by allowing a transaction to commit writes to the master database. The same transaction is also written to the slave, but that transaction is optional and written asynchronously. Two-phase commit is not performed.

Yes it's interesting. Yes it compares Oracle and MySQL.

For balance it would be helpful to have an analysis from a more independent source than Oracle.

Sharding isn't scaling. It's not application-transparent, and it's an operational nightmare.

Sharding isn't scaling

Sure sharding is scaling, although there are several types of sharding, with varying degrees of scalability. Usually people use a combination of both database (which what I was talking about) and table sharding (w/ replication,clustering,etc.) to achieve scalability. I've encountered several highly-scalable db environments like these.

It's not application-transparent

There are several databases that have features that can make sharding transparent to your application.

and it's an operational nightmare

it depends on the RDBMS that you're using.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact