
The Death of the Relational Database - pius
http://whydoeseverythingsuck.com/2008/02/death-of-relational-database.html
======
davidmathers
Here's the comment I left on Hank's blog:

Hi Hank, I like to be brief so this might sound bad, but I mean everything in
the friendliest possible way.

You and almost all the people arguing against you are wrong about almost
everything. I don't mean to say anyone's opinion is wrong. I'm talking about
basic understanding of what certain words mean.

Let me break it down. There are 3 models for "programming" (in the general
sense) computers:

1\. Functional 2\. Relational 3\. Imperative

The functional model can't store data and therefore can't be used to create a
database. So there are fundametally only 2 kinds of databases.

A database created using the Relational model uses relations to both store and
retrieve data, so lets call it a relational database. A database created using
the Imperative model uses pointers & nodes to store data and pointer
navigation to retrieve data, so lets call it a navigational database.

That's it. There are only 2 database models. Each model can be used to
implement different kinds of databases based on the limits they place on the
structure.

There are 2 primary kinds of navigational database: graph/network and
tree/hierarchy. A filesystem for example is a tree/hierarchy database.

A relation is basically a truth table with columns that are related to each
other by a truth statement and rows of truth values that fulfill the truth of
the statement. A standard relational database doesn't place any limits the
number of rows or columns. A binary database limits the numbers of columns to
2.

A SQL DBMS is a (partially successful) attempt to implement a language that
can be used to create a database which uses relational model.

OK, the important parts:

1\. The semantic web is an implementation of the relational model that limits
the relations to 3 columns and a single row.

2\. Just because you use a SQL DBMS to create a database doesn't mean you
actually created a relational database. You can put pointers in your tables,
turning your relations into nodes on a graph, turing your database into a
navigational database with some relational features.

Much of what you said in your original post was exactly backwards. You said
"relational sucks" but the things you described as problems were features of
navigational databases, not relational. Then you said "the semantic web is
awesome because it's a navigational database" when in fact it's a relational
database.

That's all.

~~~
jules
> Let me break it down. There are 3 models for "programming" (in the general
> sense) computers: > > 1\. Functional 2. Relational 3. Imperative

Could you explain why you picked these three models?

Do you mean relational in the sense that you don't have input-output
functions, but relations like:

    
    
        plus(4,4,x) => x = 8
        plus(y,4,6) => y = 2
    

Why can you store data with a relational model store but not with a functional
model?

~~~
davidmathers
Functional and relational are 2 sides of the same coin. For instance:

x + y = z

Can be viewed as either a function called binary_addition with x & y as inputs
and z as the output, or as the description of a relationship between 3 sets.

So, to solve the problem using SQL for example you would create a table
binary_addition(x,y,z) and then fill it with all the true values that caused x
+ y = z to be true and then say "select z from binary_addition where x = 4 and
y = 4"

In the functional model the computer stores the process for turning the input
values you give it into the output values you want. In the relational model
the computer stores a table containing all known possible input values, all
known possible output values, and how they relate to each other, and gives you
a way to retrieve them.

Note, the functional model is implemented by what that famous guy whose name I
can't remember called "function-level" programming languages, not functional
(aka lambda) languages.

~~~
davidmathers
The famous guy is named Robin Milner. He created the ML language. In lambda
languages like lisp or javascript you can use functions as arguments and
return values. I function-level, or applicative, languages you can only create
new functions by combining existing functions. I think. I've never used a
function-level language.

------
bumbledraven
The author is misinformed in many ways. To pick an easy example, he says that
"the relationship between objects is _built into_ the objects", and as an
example cites, "An invoice knows _as part of its structure_ , who the customer
is. That pointer to the customer is stored _in_ the invoice."

But that doesn't have to be the case at all. It's common (in general, maybe
not with customers and invoices) to store the relationship in a third table,
say "customerinvoice", that has customerid and invoiceid fields.

~~~
stcredzero
I agree the author is misinformed, though I also agree with him that
relational databases suck, especially if you are doing object oriented
programming. I'm not sure what is better, however. Also, most of the corporate
world is firmly ensconced in the relational database mindset. This makes
interacting with them difficult if you do not also "speak relational."

~~~
michaelneale
Maybe its object oriented dogma that is a problem.

~~~
KiwiNige
I've worked with a lot of accounting data, which fits the relational DB model
really well. But I can't understand why I have to use an OO model to display
it on a screen as a table of rows and columns with the odd field scattered
here and there. It just seems like a lot of extra complexity to me when all
the users want to see is SELECT * FROM TRANSACTION.... made to look pretty.

~~~
michaelneale
Indeed. OO is so ill defined anyway. What is documented as OO now I am sure is
not what alan kay intended it to be. New popular frameworks like rails don't
help either (much) - although its oo-lite - so its not so bad.

still, I am looking forward for an excuse to crack out stuff like arc and walk
away from OO for a while (I am allowed to dream).

~~~
stcredzero
I've been working with Smalltalk, which was Alan Kay's creation. Most
implementations are not what he intended -- at one Smalltalk Solutions keynote
he excoriated all of us. He said that he never intended Smalltalk to become a
programming language. He wanted to create a Montessori toy for the mind. That
said, I don't think that Smalltalk as a programming language is that far from
what he intended. He just never intended it to stop there.

------
giardini
The article is worthless: not a single sentence of the first 5 paragraphs of
the blog post makes sense when examined critically. The author does not
understand relational databases nor how flexible they are. In fact I'm fairly
certain he understands neither OOP, nor RDBMS, nor the Semantic Web (of which
I am no proponent) well if at all.

Every now and then a developer community bubbles over with complaints about
RDBMS and gets some attention. Most support is from people who, like the
author, understand OOP to a certain degree but don't understand RDBMS.

And time after time predictions of the death of the relational database model
prove wrong: RDBMS usage only increases. The relational database model
supplanted the network database model (which corresponds to the "graph
databases" the author speaks of) for good reasons.

Nothing to see here: keep moving folks.

~~~
vixen99
Am I alone in finding your comment unncessarily unpleasant? Instead of
offering unilluminating perjorative rhetoric why don't you provide even one
example of why you take issue with the article? Perhaps you're right about it
but you give no reason for supposing this to be so. Also, how about letting
the well-worn 'move along now' cliche enjoy a well-deserved rest?

~~~
giardini
Perhaps not.

No, I believe it makes sense to draw a line. When a person (indeed, even a
specialist) claims special insight (including critical insight) of a well-
examined problem he is almost always wrong. The physicist John Baez not
infrequently encounters cranks who believe they've found errors in relativity
theory or quantum mechanics. He has developed a scale for rating cranks:

[http://groups.google.ca/group/sci.physics/msg/5312a801e0785e...](http://groups.google.ca/group/sci.physics/msg/5312a801e0785e66?hl=en&);

The OP is wrong in so many ways that it renders his article meaningless. And
others have pointed out possible errors (although doing so one must interpret
the OP's intentions, a risky endeavor indeed), though certainly not to
exhaustion. To add a single specific item of criticism to the fray would only
provide yet another handle for the OP or other misled persons to grasp and
extend the discussion uselessly.

The human mind can create ideas, phrases, and analogies some of which, upon
further examination, are devoid of meaning. Dreaming is an extreme instance
wherein most of the ideas later make no sense. However the same thing can
happen while fully conscious and is part of the normal creative process.

Mathematics and logic are tools we use for separating empty ideas from useful
and meaningful ones. Unfortunately there is no Royal Road to mathematics or
logic, nor to relational databases:

<http://en.wikipedia.org/wiki/Royal_Road>

I have neither the time, nor the inclination, much less the rhetorical skill
to enlighten the OP or this group as to the vagaries of databases.

Nor do I view this as a "rhetorical" discussion: rhetoric is concerned with
swaying the populace to your side of the argument whether you are correct or
not. I am concerned about what is correct rather than what is popular.

I do not doubt the enthusiasm (or frustration) of the OP, however his
complaints are poorly-stated, unclear and orginate from an incomplete
understanding of logic and relational databases. Many similar complaints have
been stated before (often much more clearly and in a form arguable) in more
appropriate venues (e.g., Google for "relational vs OOP group:comp.*"), where
they have been thrashed about thoroughly by better, and worse, men than me.

It is one thing to register frustration. But it is another to casually
question ideas that have withstood the test of time and cast that questioning
as serious.

To show that frustration in the development of databases is nothing new see
William Kent's "Data and Reality":

[http://www.amazon.com/Data-Reality-William-
Kent/dp/158500970...](http://www.amazon.com/Data-Reality-William-
Kent/dp/1585009709)

------
tx
I am surprised by so many "author is confused" and "author is misinformed"
responses. Can't you guys operate on a higher level of abstraction? Aren't we
all here dynamic language lovers?

Just listen to what he says more closely, because he is essentially suggesting
that strong typing is bad for databases for exact same reasons it's bad for
programming languages. Yes, it makes things faster, more efficient, robust but
... (surprise!) less flexible and dumb.

"Duck-typed storagebases" are indeed the future and perhaps the number of
negative reactions is the best indicator of how novel the idea is.

I am in disagreement with "semantic web" movement (in my opinion it's already
semantic enough), but the storage part is spot on.

------
sant0sk1
"Now, along comes the semantic web just in time to make us all feel really
dumb again."

This happens to me almost daily, and I love every minute of it...

------
BrandonM
_For example, imagine starting out with a contact list. Some months later, you
add a restaurants list. Some months later again, you decide it would be great
to be able to capture, for each contact, what their favorite restaurants are.
Ideally one would want to just establish a “favorite” relationship between a
restaurant and a contact without changing the restaurant structure or the
contact structure._

Let's look at this example in a relational database. Personally, I've never
even implemented a database, but I did take one class on relational databases.
The way I learned it, the contacts would be one table and the restaurants
would be another table. A third table, let's call it "FavoriteRestaurant"
would have two columns [1]: a foreign key to an entry in the Contacts table
and a foreign key to an entry in the Restaurants table. The primary key in
this table would have to be the contacts column, since restaurants would
appear more than once. If each person can appear in the FavoriteRestaurant
table more than once (multiple favorite restaurants), then both columns would
have to serve as the primary key.

Thus, we have managed to effectively utilize a relational database to express
a new relationship, without ever changing the original data. The author said:

 _Most relational databases actually have an upper limit on the types of
objects, typically referred to as tables, which can be handled. Too many
tables in a database schema is considered bad design._

If that is indeed the case, that is where the problems lies. I am far from
being a champion of relational databases, but it seems to me like a lot of
people don't think critically about how best to store the information in their
databases. More tables, in my mind, is a good thing.

[1] The number of columns would actually be equivalent to the sum of the
number of items in the primary keys of both the Contacts and Restaurants
tables. With unique identifiers (SSNs and vendor IDs, for example), of course,
this would indeed be two columns.

~~~
pius
_A third table, let's call it "FavoriteRestaurant" would have two columns [1]:
a foreign key to an entry in the Contacts table and a foreign key to an entry
in the Restaurants table._

Yup, you could definitely express a new relationship that way. I think the
point is that adding join tables like that has traditionally been considered
an anti-pattern for relational databases because it increases duplication and
denormalizes the data, thus working against the supposed performance gains and
data integrity protection from using the RDBMS in the first place. If this is
wrong, please do correct me.

One major difference I've noticed between document/graph-oriented databases
and relational ones is that they embrace denormalization and even optimize for
it insofar as that's possible.

~~~
LogicHoleFlaw
_adding join tables like that has traditionally been considered an anti-
pattern for relational databases because it increases duplication and
denormalizes the data_

I thought that join tables express a _normalization_ of the data? You are then
not storing restaurant data explicitly as a column in the Contact table, which
reduces data duplication and gives you more fine-grained control over your
structure.

Join tables (especially reflexive ones) gave me a bit of a headache when I
first started working with SQL databases. Once I finally wrapped my head
around them I started seeing a lot of uses for and advantages of them.
However, I've had little formal training in database techniques; only a little
bit of relational algebra. Is there something I'm missing here?

~~~
pius
That sounds very plausible and, indeed, I have join tables throughout my apps.
This is why I put the proviso in, "correct me if I'm wrong." :)

My understanding was that hardcore relational database guys would say that
join _operations_ are necessary when the data's totally denormalized, but
having a join _table_ wasn't necessarily a best practice because now you've
got an additional table that could potentially get out of sync.

------
mattrepl
I'm surprised no one brought up column-oriented databases.

When attributes can be added dynamically and/or are have a small value set,
column-oriented databases with bitmap indices outperform traditional
relational databases.

See MonetDB for an open source, usable column-oriented db with bitmap index
support: <http://monetdb.nl/>

For more kvetching on the topic:
<http://dynamictyping.org/post/29661699#disqus_thread>

I tried to find a public research document with performance comparisons, but
to no avail.

------
workpost
Anybody ever heard of Netezza? I was told about it by someone who works with
vast amounts of data - many terabytes every day. It's part of a hardware
system that uses overwhelming processor power to segment and blast through
data. You don't need the traditional headers or relational database structures
to search like this. The technology is costly now (it's specialized hardware)
but if you want to talk about where databases are going beyond the relational,
this is one direction.

------
joshwa
<http://en.wikipedia.org/wiki/Entity-Attribute-Value_model>

------
andrewparker
If we give up relational databases in favor of a graph model, I'm sure we'll
have piles of blog posts complaining about the sacrifices made in that switch.
That said, the relational database is ancient technology that was built for an
entirely different purpose than for what it's used today, so I eagerly
anticipate a revision.

------
sah
This paper discusses some related reasons why traditional relational databases
are poorly suited to some modern applications:
[http://www.vldb.org/conf/2007/papers/industrial/p1150-stoneb...](http://www.vldb.org/conf/2007/papers/industrial/p1150-stonebraker.pdf)

------
pistoriusp
Whydoeseverythingsuck.com? Probably because you're asking the wrong
questions...

------
edw519
The author is confusing the term "relational databases" with the
implementations of relational database systems that he has encountered.

That's like looking at a bad Python program and saying, "Python sucks," or
like saying, "I've never seen a car go more than 10 miles without breaking
down; therefore cars are not reliable transportation."

You can store just about anything in a RDBMS pretty much any way you want.
You're limited only by your own skill and imagination and the particular
limitations of your vendor's implementation.

A better title would have been, "Here We Go Again: The Death of the Relational
Database Prematurely Announced."

~~~
michaelneale
Time for the obligatory: "rumours of RDBMS death are greatly exaggerated"

------
hank777
I am the author of this article, and I must say, though I am not a regular
reader of HN, I do find it refreshing that there are smart people here arguing
real merits in an intelligent respectful way. I am sure it is not always the
case, but it is really fun to read everyones perspective.

I dont think graph databases are for everything, but I do think that they will
end up providing a much better abstraction for the kinds of apps we tend to
write on the web. I do think an RDBMS is better for an accounting system for
example. Oh, and my examples were not designed to actually be great real world
examples, but I have a lot of less technical people reading my blog and so my
goal was to provide examples that could be expressed succinctly. That said,
there is no static example that cannot be expressed in a relational database.
The problem is that relational databases (at least the ones that are available
to us) are not at all fluid and flexible.

~~~
mixmax
"my examples were not designed to actually be great real world examples, but I
have a lot of less technical people reading my blog"

This is definitely the place to give a real world example if you have one
handy. Most people here would understand it, we would love the discussion, and
you might even get some good feedback.

And thanks for posting here :-)

~~~
hank777
Well, as I said, there is nothing that cant be expressed in an RDBMS - at
least at first. But let me give an example of the kind of use cases we see.

First, imagine having a database that allows one to freely create record
types. One might have standard data types like contacts, events, emails,
checks, expense reports, etc.

These record types are nodes. Now imagine being able to connect these nodes
using any type of edge you like. For example a contact might be connected to
an event as an "invitee". Thats how the edge would be labeled. Now the
relational folks will say that that is a relationship that could be predicted.
But at some point, some new type of record is created. And you as a user want
to connect that record to existing records. For example you have added a
"shoe" record type to keep track of all of your shoes. You then decide you
want shoes to be connected to events so that you can map what shoes you wore
to what events. You don't want to modify your schema. You don't want to add a
new mapping table, you just want to connect the record. And you want to be
able to query the graph for all the things of any type that are connected to
that record. More importantly, you want the _end user_ to be able to decide
that it would be useful to connect shoes to events since no self respecting
programmer is ever going to design such a system.

This is the type of flexibility that you need in a web application that will
evolve over time. But the minute you want to connect that new record type to
the existing object, you either have to modify your schema, or you have
created a database that is highly flexible via totally generalized mapping
tables, but is not optimized for these kinds of structures. For example just
creating a giant mapping table to connect objects will work in an RDBMS but it
is not at all optimized and will fall over at scale. Since we are building
something that will handle awesome scale, using an RDBMS in this way was a
non-starter. Philosophically, we probably have more in common with Google
BigTable than with an RDBMS.

~~~
staticshock
_you either have to modify your schema, or you have created a database that is
highly flexible via totally generalized mapping tables, but is not optimized
for these kinds of structures_

A generalizable mapping schema with tables for edges may not be optimal, but
your comparison seems to be a bit of a bait-and-switch. Why compare the
optimality of such a schema to a rigid schema instead of comparing it to the
optimality of an alternative "graph-based" data store?

Granted, an extensible schema will be slow to query/etc. What makes you say
that you can achieve better efficiency using a non-RDBMS approach? (Not that
you can't, but I didn't see your argument to that effect. I'd say that without
such an argument, the optimality/speed point is unsignificant.)

~~~
wheels
You can. Definitely. In fact, I've implemented this a few times (most recently
last week); some for specific problems, some for more generic graph support.

In a nutshell:

The problem with RDBMS approaches is that the good ones assume you can pack
your complex logic into a monster query or stored procedure and let the query
optimizer do its thing. But if you're implementing an attribute-value system
or graph traversal on top of an SQL database, you end up generating a
ginormous number of queries just to do some basic traversal. You could
potentially wrap those into a stored procedure that was doing selects into a
temporary table, but that's not really the sort of thing that most query
optimizers go to town on.

On the other hand, there are a number of systems out there that either attempt
to be full object oriented databases, or object relational mappings, or RDF
based stores, but the current off of the shelf ones tend to perform poorly
since they're not very mature (and I get the feeling are more focused on just
being able to conveniently store stuff, not actually hitting it very hard).

When I first started looking at the sort of problems that Hank's addressing
(in a series of talks I did in 2004 titled "Beyond Hierarchical Interfaces") I
naïvely thought that you could do everything with an SQL backend, tried and
failed. I could blab on about the sort of indexing that you need for these
sorts of storage, but I'll duck out for now.

Edit: Just one example of where I've done this, if anyone cares, was replacing
the old SQL backend with a dynamic (schema-less) attribute-value system and
basic query language, for my current job: <http://grunge-nouveau.net/Kore.mp4>

~~~
staticshock
Now, I may be pretty naive here, but if you're doing full on graph traversal,
why not just extract the full graph from the database and traverse it in
memory on your own terms instead of leaving it to large unoptimized traversal
queries?

~~~
wheels
For the latest data set that I'm working on there are 5 million nodes and 50
million edges, and each one has some meta-data associated with it. :-)

