
Is the Relational Database Doomed? - timf
http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php
======
dasil003
This article is fairly well-balanced, but the hype around these "hashes in the
sky" is completely out of control. This article tries to be objective, but
it's not doing enough to counter the rabid zeal that is gripping silver-
bullet-seeking, Twitter-wannabe young web developers. My observations are that
too many people are making the following mistakes:

* Overestimating their need to scale - a powerful box with a cache-backed site can serve a LOT of hits. The vast majority of apps will never need to scale beyond this, don't kid yourself.

* Underestimating the amount of work and the cost of failure in maintaining data integrity at the application level - for all but the most simplistic applications you will end up writing a ton of code that replicates what a RDBMS does, except slower, with more bugs, and less generality.

* Underestimating the importance of data integrity - When your massive database with billions of records has subtle data integrity issues, how hard will it be to fix? My guess is that some truly nasty situations will arise and over time the hype will be tempered by the horror stories.

* Underestimating the constraints this puts on future development - Sure greenfield apps may seem like a great candidate for a document-oriented DB, but how often do apps look like the original version 1 year later, 2 years later, 5 years later. With a relational database your bets are hedged automatically. You can go in a million different directions with your data. The relational model gives you orders of magnitude more flexibility than complex algorithms designed for map-reduce-style scalability. If you don't need the scalability you are throwing an awful lot of flexibility out the window for nothing.

* Underestimating how relational their data really is - It's not just that developers understand the relational model, it's the fact that it's a rigorous theoretical model that actually models relationships in an academically complete way. Sure, real life RDBMSs don't live up to the theory, and sometimes things are too slow to be practical, in which case you fudge things as necessary or add layers of caching. But at the end of the day, the relational model can cover pretty much anything, where these hash-databases give you a small set of scalable functionality and some clever algorithms to accomplish a bunch of different things. The things you can accomplish are not comprehensive in a theoretical sense, they are just techniques that have proven useful for a number of applications recently. However the limitations are not as well defined as a RDBMS.

~~~
miked
>> Underestimating the amount of work and the cost of failure in maintaining
data integrity at the application level - for all but the most simplistic
applications you will end up writing a ton of code that replicates what a
RDBMS does, except slower, with more bugs, and less generality.

If I'm the first to explicate the underlying rule, can I name it after myself?
Please? Pretty please with Lisp on it?

Doffing's Tenth Rule of Database Systems (with apologies to Phillip
Greenspun):

"Any sufficiently complicated database management system contains an ad hoc,
informally-specified, bug-ridden, slow implementation of half of a good
RDBMS."

------
russell
The answer is "no". The question is "will key/value cloud computing data bases
replace relational"? RDBs are very good at modeling your data, but have
scaling problems. The cloud computing solutions scale well but are really poor
at modeling data; you have to do it yourself. Example problem: you commit an
update to a record. Subsequent reads may not see the change until the DB gets
around to committing it some time later. Can't run your airlines or banks that
way. The eventual solution will have RDB-like modeling with cloud-like
scaling, replication, robustness, and all that good stuff. The real solution
will allow you to request high data integrity along with I-dont-care-what-
happens for storing your RSS feeds or slashdot comments.

~~~
jacobscott
Whoever solves the "scalability or consistency: choose one" problem will make
a boatload of money. From what I can tell this is a very hard problem, so
software that makes it seamless to operate at multiple levels of this tradeoff
curve -- perhaps at the same time, for different parts of your data -- would
also be a big win.

~~~
jasonkester
I see this as an area where Microsoft could pull out a big win. Imagine if,
instead of their half-baked BigTable clone that they just released, they had
instead put 1000 or so brains on the problem of distributing a single SQL
Server database across N machines.

Scaling out to a dozen DB servers that master/slave their way to scalability
is no fun, but it's solved. The problem is that you're currently required to
do it yourself. It would rock to be able to outsource that to the cloud.

I want to toss my ASP.NET application up into the Microsoft Cloud, where it
will figure out how many webservers it needs to spread itself across and how
many database servers it needs to fire up to handle the load it's seeing. And
I want it to pretend like it's a single webserver talking to a single DB
instance on a single box.

Say what you will about Microsoft, but they have the skills to pull that off.
I sure hope they're working on it.

~~~
wmf
_Imagine if, instead of their half-baked BigTable clone that they just
released, they had instead put 1000 or so brains on the problem of
distributing a single SQL Server database across N machines._

Unfortunately brains don't scale, so you're better off using one huge brain,
like Michael Stonebraker.

<http://db.cs.yale.edu/hstore/>

~~~
mattculbreth
Right and his commercial startup is Vertica <http://www.vertica.com>

------
jerf
Context: So, why are relational databases going to die? Because only key-value
hash lookups are fast enough. Wait, can't RDBMs do hash-based lookups too? Yes
they can. So what's the problem? Well, if you want to take advantage of the
relations, you have to use joins and such. Why is that bad? The query plans
the RDBMs uses end up being slow. Putative solution? Throw out RDBMs, replace
them with key/value DBs (which are strict and severely impoverished subsets of
RDBMs), and implement your own relations.

My comment: This makes me wonder if the solution shouldn't be looked for in a
different direction. Instead of throwing out the RDBMs, why not give me one
that gives me complete control of the query plan? That's the real stickler
here, not the relations themselves, which are there whether you maintain them
or not. If the DB can't guess up a good query plan, why not let me simply feed
one to it?

Is there any DB that lets me have complete control over the query plan? (And
even if there is, it's probably a half-hearted, unoptimized feature with
little design effort poured behind it.)

Yes, even so you may have to give up 40-table joins, but it seems to me RDBMs,
with more control given to the user, could still provide a lot of value
without throwing out the baby too.

~~~
neilc
_Why is that bad? The query plans the RDBMs uses end up being slow._

I think it's simply not the case that people are using key-value stores
because the query plans that a typical DBMS optimizer chooses are suboptimal
for the kinds of queries most web apps use.

------
miked
The rise of functional programming might not be accelerated by the pain of
impedance matching to RDBMSs. Converting between the relational and OO models
of data is a time-consuming pain. If this is widely realized, it could give FP
a boost.

>> Is the Relational Database Doomed?

Ah, linkbait! Sorry, no sale. Using a (pre-compiled) stored procedure to do a
key value lookup will likely be about as fast as a lookup in a key/value DB.
In any case, it's easier to add a key/value lookup capability to an RDBMS than
it is to add efficient query capabilities to a KVDB.

The article focuses on scaling, which is rarely an issue. KVDBs win on
simplicity and flexibility (no need to pre-define a data model) and a better
impedance match with OOPLs. RDBMSs crush on querying, which most (though not
all) applications need.

~~~
miked
>> The rise of functional programming might not be accelerated by the pain of
impedance matching to RDBMSs.

Arrrgh! I meant that it _might be_ accelerated by the pain of impedance
matching to RDBMSs. Sorry, it was late at night.

------
ocskills
The future is here already, and has been for decades.

These non-relational database structures are not a new or revolutionary idea.
The same concepts have long been implemented in systems like LDAP, X.500, and
document-oriented databases like Lotus Domino.

The real mystery is the recent trend towards completely reinventing the wheel.
For instance, LDAP (the protocol) is incredibly efficient, standardized, and
well supported on most development platforms. It provides a standard query /
filter language, facilities for scoping, and data updates. There are
standardized exchange formats, wide availability of OSS and commercial
management tools, and multiple server implementations.

Yet, there seems to be a great interest in developing from scratch highly
proprietary and non-portable alternatives that do exactly the same thing (or
less). These systems ignore decades of research and lessons learned through
practical implementation.

The real issue though is that direct comparison of the two models is
inherently flawed. They each have their own strengths and weaknesses, each
excelling in situations that the other falls short. Looking at them as
complementary models is perhaps a better approach.

~~~
KWD
As I started reading the article my very first thought went to Lotus Notes
(never developed in it, but had friends that did). I agree with you, that they
are complementary models is probably a better approach to the comparison.

------
neilk
The future is probably going to be a mix. RDBMS are good for data integrity.
The non-relational databases are great for distributing queries. I've seen
some apps which use a slow, asynchronous RDBMS to supply denormalized
'documents' for the non-relational stores.

Mind you, the RDBMS could still make a comeback. The world is looking to non-
relational because since 1998 or so, throwing multiple pieces of hardware and
network capacity at the problem is cheaper than solving it all in one giant
node or cluster. But maybe one day an off-the-shelf computer (perhaps with
better concurrent programming techniques for multicore) will be able to handle
all the queries you would ever want and have the bandwidth to match.

Also, just wondering; exactly how many organizations really need a distributed
database? There are some top websites, like LinkedIn, that do it all in one
giant node, and my guess is that even sites like Facebook have only a few
hundred database shards, maybe a few thousand at most.

~~~
mooism2
I do wonder how many people on EC2 use SimpleDB because they need (or think
they will need) the scalability, as opposed to thinking it's easier to code
for, or simply just so they can avoid the drudgery of running their own RDBMS.

------
tdavis
Are Non-Baiting Titles Doomed?

~~~
diN0bot
yes

(i meant to say that i hope one word answers are doomed as well)

------
wmf
There may be a good point in here somewhere, but it's not clear what it is.
Relational databases don't scale, so... try CouchDB! or Drizzle! ...which also
don't scale. What?

For a more complete list of RDBMS-killing science projects, try
[http://www.metabrew.com/article/anti-rdbms-a-list-of-
distrib...](http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-
key-value-stores/) (discussion: <http://news.ycombinator.com/item?id=440687> )

------
bayareaguy
Mr. Bain's article would be more worthwhile if he were to elaborate on what it
means for a knowledge representation system to be "doomed".

The relational model proposed by Codd[1] was formulated so that applications
which adhere to it would continue to work and be free of inconsistencies when
the underlying data is updated or reorganized. That may not sound like much
but it's surprisingly useful when the data is important or valuable and needs
to always be correct. This is also the whole point of all those irritating
normal forms.

The relational model has a lot of shortcomings but it's not likely to be
replaced by anything that doesn't address these issues.

That said, for some kinds of data it is definitely overkill. Religiously
following the relational model for short lived stuff with little significance
makes about as much sense as filling out your shopping lists in triplicate,
storing permanent copies in a safe deposit box and requiring two signatures
for every alteration the same way you might if you changing your will or a
deed to your property.

[1] - "Relational Model of Data for Large Shared Data Banks" ,
<http://www.cis.upenn.edu/~zives/03f/cis550/codd.pdf>

~~~
JunkDNA
Codd's paper is instructive from a historical perspective as well. When you
read it, you realize that people were grappling with all these same issues
before the relational database came on the scene. His description of
hierarchical data storage systems describes some of the issues with things
like XML databases and XQuery pretty well.

------
tokenadult
From the submitted article: "But in making your decision, remember the
database's limitations and the risks you face by branching off the relational
path.

"For all other requirements, you are probably best off with the good old
RDBMS. So, is the relational database doomed? Clearly not. Well, not yet at
least."

So the answer is no. The article doesn't do justice to the theoretical rigor
of the relational database model. An online posting with some interesting
discussion of technical trade-offs in popular terms and links to other posts
can be found at

[http://highscalability.com/paper-dynamo-amazon-s-highly-
avai...](http://highscalability.com/paper-dynamo-amazon-s-highly-available-
key-value-store)

------
jrockway
RDBMSs are not doomed, but people are beginning to realize that they are not
the Solution To Every Problem. I don't know why people ever started treating
them like that; they really only handle one (small) problem space well --
modeling relational data. If that's the problem space you're working in,
there's no better tool. If that's _not_ the problem space you're working in,
then now we have some better options.

The reality is that 99% of web apps don't really want a RDBMS. There are no
arbitrary queries to run, they just need to extract some objects from the
database, interact with them, and render a web page. As of late, people have
been using ORMs to make their relational database look like an object
database. The problem is that the objects you get from an ORM don't work like
real objects (try making a cyclical structure, or try `change-class`-ing the
objects; doesn't work). This usually means you need to map the objects from
your ORM into a better model of your app, adding complexity. (Some ORMs also
screw up the relational part, making the data in the database nearly useless
to any other apps. Oops.)

People are starting to realize that this model is a waste of their time, and
they are using object databases instead. Now there is no ORM, they write their
objects that the app interacts with, and those are persisted as needed.
Suddenly half your app's code is gone, and it runs faster. That is why people
are excited. Less code is always exciting.

This article is really heavy on key/value databases, probably because they are
easy to understand. But really, key/value databases are like the assembly
language of databases. You will have to do a lot of work to run your app on a
key/value database. It is better to use something higher-level, like CouchDB
or an object database. (For an OODB, I recommend KiokuDB. It has indexing
(stolen from Postgres), and can store your data to BDB, a directory on disk, a
RDBMS, Amazon SimpleDB, or CouchDB. This gives you a lot of flexibility.)

~~~
fauigerzigerk
"The reality is that 99% of web apps don't really want a RDBMS. There are no
arbitrary queries to run, they just need to extract some objects from the
database"

Well, I'll tell you what my reality has been. It's been that every time I
started a project, it appeared like I just needed to extract some objects from
the database. But very soon it turned out that there wasn't just one way to
view that information. There were two or three important use cases that
destroyed my preferred, supposedly "natural" view.

Now, I'm fully aware that sometimes one preferred view of the information you
have is so dominant that you have to optimize for that case. The question, is
how do you optimize? Do you hard code that particular view from the beginning,
or do you normalize first and then layer an optimized view on top of it, e.g.
in the form of caching or materialized views? RDBMS make it easy to do the
latter.

You say "they really only handle one (small) problem space well -- modeling
relational data."

I think being relational is not an a priori property of data and relational
modelling is not a problem space. It's the other way around. First you have a
problem space, then you decide how to model it, and then you get the data
according to that model. Assuming that information already has a kind of
"natural" model attached to it is a fallacy in my view.

------
yason
Where comes the notion that it were to be either this or that?

Why not use a relational database for data that actually suits the relational
model, and an object/hierarchical/keyvalue database for complete entities that
are often mapped to objects in code? Just put each kind of data to a storage
that makes sense.

If you start with a RDBMS, there are plenty of options of how to store the
objects. You can just use plain files in the simplest case. Or a good dbm
style database. Or if your objects are mostly read-only you can keep the data
in the RDB and cache the big joins. Or if you're really stuck with the RDB
only you can even store the key-value pairs in the RDB: just make a two-column
table for run-time data and keep it separate from the actual relational data.
Or keep everything in the RDB but make a live copy off the RDB when the user
logs in to keep the runtime data accessible and then unpack it back to the RDB
lazily or when the user logs out. There are pretty much endless possibilities
depending on your business and search needs.

Ain't easy but non-trivial databases never are.

------
JulianMorrison
Aside from RDB and key-value, one other model I haven't heard mentioned much
is Terracotta (currently the only example of its kind, AFAIK). That gives you
basically all the advantages of a database, including replicated ACID
transactions, but with data that behaves like live objects. Better than live
objects, in fact, because of lazy loading and unloading, which let objects
grow beyond the limits of RAM. Better than a database, too, in that it only
moves diffs, and it knows exactly where they're needed. If you are the only
one currently touching a data structure, the only traffic is "may I?" "carry
on until I say otherwise".

------
rgrieselhuber
I've been fascinated with Key-value databases for almost as long as my career.
Interestingly enough, I first came across the concept when I did a little work
with Lotus Notes, way back in 2001. My memory is a little rusty on the full
capabilities of Lotus databases, so I don't know if it applies to their
format, but what I miss most from other HashMaps in the cloud are aggregate
functions.

As a developer, I'd prefer one of two innovations: relational databases that
scale better across machines or k/v databases that provide a familiar
mechanism for ad-hoc querying and aggregation across keys.

------
minalecs
why can't they both just get along

------
Allocator2008
A colleague once pointed out to me that my preference for flat file systems
over relational databases was likely why I tended to have trouble forming and
maintaining relationships. This I could not dispute, but nor has the knowledge
altered my preference for flat files.

------
c00p3r
memcached + MySQL solution was invented by livejournal.com and is using by
facebook.com. So-called in-memory databases also in the market. And second
important moment is the prices of RAM modules, SSDs and commodity hardware. In
some cases it is better to create in-house solution that reflects your data
structures and data flows.

~~~
c00p3r
Thanks for downgrading me! I didn't knew about CouchDB! I'm stupid!

------
jacquesm
no.

~~~
bprater
Insightful!

------
bprater
I'm sure the day will come when RMDBs will be a distant memory, but sometimes
it so comforting to open a database and see all the data in pretty columns and
rows.

