

Why you think databases don't scale - In a few words: you’re doing it wrong! - edw519
http://staticallytyped.com/2008/04/10/why-you-think-databases-dont-scale/

======
gcv
The author makes a decent argument, but I don't think he addresses the
entirety of the issue.

Scaling a database has two components: reliability, and performance under
heavy load. Accomplishing these goals typically requires some form of
redundancy, but few RDBMSes give developers and administrators any tools to
accomplish anything except dumb read-only replicas. Even Oracle RAC, a high-
end commercial product, provides no data redundancy; it just lets multiple
database servers talk to the same physical database. This means that, with
lots of money to throw at RAC backed by a SAN, not to mention a team of admins
to run the whole thing, you can scale up and not worry about data loss.
Hopefully.

A team which doesn't have 1M USD a year to spend on storage software,
hardware, and support, but still must store large datasets, must spend a lot
of time partitioning the data and working around the problems inherent in
querying it out of multiple database servers. And frankly, that stinks. That's
why I personally think RDBMSes don't scale: because, as a tool, they utterly
fail to save me time and effort in achieving scalability.

I'm very impressed with how fragmented Mnesia tables work: you tell Mnesia how
many replicas and fragments you want, add enough nodes to fulfill your
requirements, and it takes care of pushing enough copies of your data to
enough nodes for things to fly. That's how a database should work these days.

I agree with the author that simplifying data access and reducing the number
of ridiculous joins (I've worked on an app which normalized its data to the
point that it needed an 8-way join before anything would work at all) is
important. However, even once you do that, operating on anything where all the
code which talks to the database has to traverse the entire cluster and
aggregate results at the application level is not good. That's one reason I
think GAE is quite attractive.

Let's see if MySQL 5.1 brings any improvements to the table, and if they come
with strings attached.

~~~
astrec
To be fair, Oracle RAC is not a complete solution and doesn't really pretend
to be. For HA you really need RAC and Dataguard.

Btw, I'm quite the Mnesia fan too.

------
hendler
Great topic, I think.

Scaling has a lot of non-technical components - often people doing it wrong.
That's true, sometimes I don't have time to do it right - creating a demand
for an out of the box solution.

Currently I'm experimenting with <http://vertica.com> 's VerticaDB. It's easy
to install, uses regular SQL and other RDBMS concepts, has some newer
techniques for optimization out of the box (compression, joins, etc). I can't
speak to scalability, but since it's designed for just these issues I'm
looking for, I have high hopes.

I also looked into Hadoop/HBase - and there's promise there. Other DHTs are
known to have issues with certain kinds of work.

Hadn't heard of Mnesia.

