When I worked at Amazon.com we had a deeply-ingrained hatred for all of the SQL databases in our systems. Now, we knew perfectly well how to scale them through partitioning and other means. But making them highly available was another matter. Replication and failover give you basic reliability, but it's very limited and inflexible compared to a real distributed datastore with master-master replication, partition tolerance, consensus and/or eventual consistency, or other availability-oriented features.
I think everyone is missing the boat by evaluating datastores in terms of scalability. Even at Amazon's size, database scaling is only occasionally a problem. By the time your site is that big, hopefully you've split your software into independent services, each with its own (relatively) small database. And for the services that do need a distributed cache or persistence layer, there's now a huge body of literature and software that solves those problems.
Reliability is another matter. Every business can benefit from improved reliability, regardless of size. And reliability is harder. Scaling is basically an engineering problem, which can be solved with the right architecture and algorithms. Reliability is more like security: it's a weakest-link problem and requires constant vigilance in all areas, including engineering, testing, operations, and deployment. Adding redundant components improves reliability against hardware failures (which are independent between hosts) but not software bugs (which typically are not). Many of the comments on http://news.ycombinator.com/item?id=859058 are missing this point.
Anything that mitigates reliability problems has huge potential for saving costs and increasing value. CouchDB, for example, doesn't even provide basic scaling features like automatic partitioning, but it does provide reliability features like distribution, replication, snapshots, record-level version history, and MVCC. That is what makes it interesting and useful in my opinion. Similarly, although people talk about Erlang as a concurrency-oriented language, it's really a reliability-oriented language. The concurrency in Erlang/OTP is not designed for high-speed parallel computation. Rather, it's one part of an overall strategy for reliability, a strategy which also includes the ability to update code in running processes, a standard distributed transactional database, and built-in process supervision trees. (Steve Vinoski wrote something similar in his blog a while ago: http://steve.vinoski.net/blog/2008/05/01/erlang-its-about-re...)
Yes, yes, a thousand times, yes. We seem to be slowly moving toward this ideal at JTV as we grow, because scaling monolithic database architectures is a pain, and it isn't reliable. Even if you scale your database via replication or sharding, you've still got a global choke point for reliability and performance. If your database(s) get slow, if replications lag, if a server dies or whatever, there's a good chance it will take down your whole site. On the other hand, if you use services, you can lose individual features without taking down everything.
I would go so far as to say that it's probably a good idea to build a service-oriented architecture from the start. It's only a bit harder to do, it keeps your code base cleaner and more orthogonal over time, and it makes scaling much easier, should the need ever arise.
Here is a light interview (by none other than Jim Gray) on this published in ACM Queue from 2006.
Also, when used in active-active configuration, replication actually decreases the reliability of your application by the number of replicated nodes, because you're increasing the number of point of failures in your application (although scalability increases). Replication increases reliability only if used in an active-passive configuration. I've ran into this issue over and over again.
here's an interesting paper on sharding with Oracle vs. Mysql
After seeing some real-world, big business problems solved with weakened consistency guarentees, I'm skeptical that there are as many problems that "need" ACID as most people think. Rather, I think that (a) most engineers have not yet been exposed to the reasonable alternatives to ACID, and so have not actually thought about whether they "need" it, and (b) most businesses do not yet have the knowledge they need to weigh ACID's benefits against its availability costs.
For balance it would be helpful to have an analysis from a more independent source than Oracle.
Sure sharding is scaling, although there are several types of sharding, with varying degrees of scalability. Usually people use a combination of both database (which what I was talking about) and table sharding (w/ replication,clustering,etc.) to achieve scalability. I've encountered several highly-scalable db environments like these.
It's not application-transparent
There are several databases that have features that can make sharding transparent to your application.
and it's an operational nightmare
it depends on the RDBMS that you're using.