
The current database debate and graph databases - Anon84
http://blog.neo4j.org/2009/04/current-database-debate-and-graph.html
======
Retric
People who don't understand databases theory complain, news at 11.

 _it isn't helpful when the data model evolves over time_

No, poorly designed and maintained databases have issues, but that's a
developer problem not a database problem. SQL is highly flexible and you can
dynamically update the database format at run time which few "modern"
languages let you do. IMO, The problem is developers who think OOP means all
data must map 1 to 1 with an object which manipulates that data and then
complain that the database is now tightly coupled with their code.

As to scalability it's that's a software problem that has little to do with
the relational model. A link's to B and B is somewhere out in the cluster
works just fine. You can even write a relational engine that uses a key value
store to keep the records.

PS: I like to think of it as Java/C++/whatever infecting the database.

~~~
wheels
Spoken like someone who has never implemented a large graph in a relational
database.

One of the main things you do with a graph is traversal, not just recalling
edges. Doing graph _traversal_ in an RDBMS is beyond painful. Every edge you
traverse typically comprises of one query. Sending that rate of queries to an
RDBMS isn't anywhere close to efficient.

As for key-value stuff, that can be done pretty easily in an RDBMS, but is
often more natural in databases geared specifically towards that. If you'd
prefer not to use those sorts of systems, by all means, don't.

Relational databases break down at the point that the sort of queries that
you're interested in running don't map flexibly to a relational model. You can
insist this isn't the case with handwaving and saying the guy that wrote the
DB doesn't know anything about databases, but Google, Amazon, eBay and
LinkedIn (just to name the few that come to mind off the top of my head)
disagree.

~~~
justinsb
It's about using the right tool for the job. It seems most of the companies
you mentioned have decided that relational databases are actually the right
tool. You mentioned...

Google: Proprietary stack for document-orientated systems (search, GMail), use
MySQL to run AdWords (their revenue business).
[http://xooglers.blogspot.com/2005/12/lets-get-real-
database....](http://xooglers.blogspot.com/2005/12/lets-get-real-
database.html)

Amazon: _Major_ Oracle customer. Functionality is managed by modular teams,
each of which has their own architectural choices, so difficult to generalize.
Certainly in the early days was entirely Oracle (and look, it's a story about
Oracle failing!) [http://glinden.blogspot.com/2006/03/early-amazon-oracle-
down...](http://glinden.blogspot.com/2006/03/early-amazon-oracle-down.html)
Want _you_ to use SimpleDB, but difficult to find Amazon groups actually
eating the dog-'food'.

eBay: Oracle.
[http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29...](http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf)

LinkedIn: Oracle & MySQL. Custom caching layer on top to get round the 'graph'
problems you talked about. <http://hurvitz.org/blog/2008/06/linkedin-
architecture>

OK, what about some others:

Facebook: MySQL. Memcached for caching. Custom data stores for images and for
inbox search (so non-relational for document storage again)
[http://glinden.blogspot.com/2008/05/scaling-facebooks-
databa...](http://glinden.blogspot.com/2008/05/scaling-facebooks-
databases.html)

Wikipedia: MySQL. They're document storage, so it _is_ possible to get
document storage to work on MySQL. [http://venublog.com/2008/04/16/notes-from-
scaling-mysql-up-o...](http://venublog.com/2008/04/16/notes-from-scaling-
mysql-up-or-out/)

Maybe it's just not sexy to stand up at a conference and say 'We architected
this system to run on a relational database, in the normal proven way, and it
runs wonderfully in production. We scaled it using partitioning, in the normal
way. We're generating real money with the system.' You only hear a biased
sample, which is why you think the companies you named are good examples of
people that use non-relational databases.

~~~
wheels
We seem to be oscillating between "key value stores are the way!" and
"relational databases are the way!"

Both have their places. If my notes were read as seeming to advocate the end
of relational databases, please excuse me because that was certainly not my
intent.

Of course all of those companies use relational databases. Often they're the
best tools for the job and in an IT landscape as large as their respective
setups, _of course_ they're the right tool for several jobs. But they've also
got problem domains where they've decided RDBMSes are _not_ the best tools for
the job, which was my point to begin with. This thread started off with the
assertion, basically, that if you weren't using a RDBMS to store everything,
you probably didn't know anything about databases and I feel like we've
(including the original poster) been slowly narrowing in on something closer
to reality.

As for the examples on the non-RDBMS side, BigTable, SimpleDB, Voldemort (+)
and, well, the guy that I met with last week that wrote eBay Germany's search
system and told me they had their own non-RDBMS storage system.

\+ [http://blog.linkedin.com/2009/03/20/project-voldemort-
scalin...](http://blog.linkedin.com/2009/03/20/project-voldemort-scaling-
simple-storage-at-linkedin/)

~~~
justinsb
Let's take the Rails approach: you don't need a non-relational database.

If you're smart enough to know why this isn't true in your case, and you need
to step outside the recommended framework, great. Your application is in the
unusual 0.1% (but are you sure you're not in the 1% of the 99.9% that's just
made a mistake?)

The problem is, all the big names like to name-drop interesting projects, and
people assume that Amazon is built on SimpleDB, or Google is built on
BigTable, or LinkedIn is built on Voldemort, or Facebook is built on
Cassandra, when in reality none of them really are.

Odds are, you're not an Amazon, Google, LinkedIn, Facebook, and the odds are
even more against you being in that sliver of functionality where they're
using this cool tech.

~~~
wheels
I feel like you're arguing with a straw-man here. I've not said that most
sites should avoid relational databases. I've said that there exist
applications where RDBMSes are less than ideal and graph / key-value systems
seem to be a better fit. We seem to agree there. We can argue over how large
that slice is, but since we'd both be making numbers up, it seems rather
pointless.

Your business seems to be invested in pushing out commodity RDBMSes. Mine is
focused on applications that need super fast graph traversal. It's no shocker
that we have a different take on this. :-)

~~~
justinsb
Just read your profile & followed the link to your company. I like the product
you guys are building, and certainly understand where you're coming from now.

I just take exception to the name-dropping of big-companies as examples of
companies that have decided against relational databases, when they've only
done so in incredibly narrow niches of very large systems. It would be more
accurate to say they've decided against traditional filesystems, yet we don't
see articles proclaiming the death of the filesystem. This is the real straw
man argument, I'd suggest.

I'm sure there are applications out there for whom graph databases are ideal;
the only way to get real numbers is to try to bring a product to market, so I
wish you well!

------
arohner
Often I wonder if people are attacking the wrong problem.

Personally, my "DB" problems mostly revolve around SQL the language, as
opposed to the RDBMS model. SQL is a piss poor language for what most
developers want to accomplish.

1) SQL queries are not composable, and it's a bear to make more complex
queries by building on top of existing queries. i.e. I have a query to find
all customers who live in CA. Now I want to find the list of customers who
live in CA and bought product Foo in the last year. There are many tools like
ActiveRecord and SQLAlchemy that attempt to sweep some of the complexity under
the rug, but nothing comes close to covering it in the general case, and once
it breaks you're back to writing SQL by hand. I blame the complexity of the
language syntax.

2.) The SQL model is declarative, which is nice when it works, but really
sucks when it doesn't. Being able to look at a query and know whether it is
fast or not depends highly on your DB vendor and product version, and is
basically impossible without the EXPLAIN command. Writing efficient SQL
requires you know that your vendors optimizes _this_ command, but not _that_
command, but only in versions 8.2 and later.

3.) Sometimes, I really really want to call map, filter and reduce on my
dataset to write my own queries. In some cases, it is much simpler than a page
of SQL. Yes, this isn't purely relational, but I don't care. I want fast,
simple access to my data, and preferably in the same language I'm using for
development . No programming paradigm (imperative, OO, declarative,
functional) is perfectly applicable in all situations. So why does SQL declare
that declarative style is the One True Way to get access to your data?

------
bdfh42
At the moment, the demand for highly scalable post relational (can we say)
databases is fairly limited although with many governments now keen to try and
store every email, phone conversation and Internet browsing session of their
populace perhaps the demand is going to increase in the near term.

It is my view that the RDBMS has a lot of life in it yet - indeed it's
effective penetration is still woefully low. Far too many businesses have no
coherent database in place and rely upon piecemeal record keeping and ad hoc
processes just to manage day to day transactions. Perhaps we need even better
relational solutions before we need post-relational ones.

~~~
davidmathers
"post relational (can we say)"

I think you meant to say "pre-relational"

~~~
jrockway
This doesn't really mean much. The computer science collective loves to ignore
good ideas for many decades.

How many mainstream languages integrate 50-year-old ideas like macros? (Also,
look how long it took for people to realize that things like garbage
collection and virtual machines were good ideas.)

------
spot
an interesting & open graph database from danny hillis & co:

<http://www.freebase.com>

<http://www.freebase.com/view/freebase/faq>

[http://www.freebase.com/view/en/creating_types_and_propertie...](http://www.freebase.com/view/en/creating_types_and_properties)

------
TweedHeads
The relational model is perfect, what is not is the rigid implementation of it
by sql vendors.

Oracle offers object views and graphs and a million different ways of storing
and retreiving data.

Compare that to mysql and it feels like riding a bike over railtracks.

~~~
Femur
As an Oracle DBA by profession, I agree with you 100%. The Oracle product is
more than 30 years old and extremely refined and developed. It can do amazing
things when used properly.

~~~
gnaritas
Yes, it's particularly good at emptying your wallet.

------
leandro
Ðis is ſtupid. People condemn ðe relational model for SQL limitations, but SQL
is not relational at all.

