

Should You Go Beyond Relational Databases? - olegp
http://thinkvitamin.com/dev/should-you-go-beyond-relational-databases/

======
timwiseman
This was and interesting article, but there I would have a significant quibble
with it.

His list of "symptoms" that a relational database is not right for you are
more often symptoms that your relational database was not well designed (or
that the problem you were trying to solve changed along the way) rather than
that the relational model itself does not fit your needs.

I think this applies to all his "structural symptoms" but especially with "Do
you have tables with lots of columns, only a few of which are actually used by
any particular row?" This is more often a sign that the database was not
properly normalized than anything else.

This of course is not to say that the relational model is right for
everything. In some cases, object oriented databases are the way to go and if
you truly need a vast level of scalability and you can afford to relax the
ACID standard then it makes sense to look at non-relational options.

There is room in the world for both relational and non-relational models, but
I do not think fashion should play a role in choosing a technology personally.

~~~
martinkl
If your objects have a wide variety of attributes, either you need lots of
columns with many of them empty, or you need some sort of attribute table with
object-key-value triples. Both are bad relational design, but the structure is
inherent in the data. If you're having to use a fixed DB schema, I can't see
any way around bad design.

When you mention object databases, can you name any examples? As far as I know
object databases where talked about a lot some years ago, but never really
gained significant widespread adoption.

Also, the article makes the point that non-relational databases are not just
about scalability. In fact, in the case of graph databases, scaling problems
are just as pronounced as with relational databases. The different data models
can enable you to make good database designs, even if the structure of your
data is against you.

~~~
lneves
_If your objects have a wide variety of attributes, either you need lots of
columns with many of them empty, or you need some sort of attribute table with
object-key-value triples. Both are bad relational design, but the structure is
inherent in the data. If you're having to use a fixed DB schema, I can't see
any way around bad design._

PostgreSQL HStore is very handy in situations such as this:
<http://www.postgresql.org/docs/8.3/static/hstore.html>

~~~
DannoHung
So your solution is to point to a database capacity that violates the first
normal form?

~~~
fhars
Um, no. "Set of key-value pairs" can be a perfectly valid atomic data type as
far as the database is concerned, and so a database containing tables with
hstore columns can be in any normal form you want, just like a database with
varchar columns can be normalized, as long as you don't give the contents of
the varchar or set any structural meaning as far as the database is concerned.
There is nothing in the relational model that forbids storing composite values
(even whole other database tables [edit: of which hstore is just a trivial
example]) as a value of a field. Whether it violates any normal form doesn't
depend on the type of the elements stored, but on their interpretation in your
data model. Although that doesn't mean that I would be surprised if I saw a
use of hstore that is ill-advised. Quite the opposite, actually.

------
andres
I've had problems with MemcacheDB and I'm curious to hear about other people's
experiences.

Basically, I've found that under high read/write loads I get occasional socket
timeouts (tested on linux/osx). I think the underlying reason for the timeouts
are Berkeley DB locks when data gets flushed to disk. Beyond the hassle of
getting intermittent socket timeouts the real problem is that the memcache
client API fails gracefully because it was specifically designed to be fault
tolerant. You can check for socket errors and retry queries but you'll still
get unpredictable query times.

I'm coming to the conclusion that it's an architecture issue - MemcacheDB is a
persistent database abstracted behind an interface specifically designed for
non-persistence. The abstraction leaks when the database locks.

~~~
amix
I have had similar issues like you (i.e. very poor performance of Berkley DB,
especially under load). I am sure thought one can configure this (some options
are listed here: [http://www.oracle.com/technology/documentation/berkeley-
db/d...](http://www.oracle.com/technology/documentation/berkeley-
db/db/api_c/env_set_flags.html) ), I have tried different options, but I can't
recall the effects.

Anyway, after testing MemcacheDB and Tokyo Tyrant in production my conclusion
was to use Tokyo Tyrant instead of MemcacheDB. Tokyo Tyrant implements the
memcached protocol and performs really well under load (and has TONS of
features such as master-master replication, Lua scripting, different types of
engines [hash, b-tree or memory]). You can also check LightCloud, which is a
distributed key-value database built on top of Tokyo Tyrant.

~~~
andres
Thanks for the response - I'm glad to hear I'm not the only one experiencing
these issues. I've read a lot of articles recommending MemcacheDB but none
mention these issues. I wish the authors had actually tested MemcacheDB before
writing about it. I wasted a day working on this.

------
russell
Ooh, my head hurts. Too many options, but this article is a good place to see
the variety post-relational databases. It's also good to see that innovation
is alive an well in the database arena. I think that it is also good that all
the options will make a good architect think about his design. To use these
highly distributed designs you give up things, like umpteen indexes per table,
ACID compliance, a column for every day of the year, and 20-way joins.

I think its a good to go away from relational and look at these things and
come back and look at your current practices. Maybe every attribute doesn't
need a column and an index. Maybe use a JSON bag to hold the loose parts.
Maybe throw away a bunch of indexes; create them on the fly for the monthly
report. Maybe you can give away some of your referential integrity.

~~~
YuriNiyazov
FWIW I've been using the JSON bag approach for the loose parts on my current
project, and it's been much easier than "the standard approach" both from
maintenance and speed of feature development.

------
neilc
_Standard SQL cannot query transitive relationships, i.e. variable-length
chains of joins which continue until some condition is reached_

Actually it can: SQL:2003 includes recursive queries (WITH RECURSIVE), and
several popular database systems implement it.

~~~
gnaritas
Which still returns a flat data structure that you have to manually use to
rebuild the tree you're looking for. I you really need the ability to easily
store and retrieve trees of data, don't use a relational database.

~~~
jjs
If you can rebuild the tree using only the information returned in that flat
data structure, and the database performance is good, then why not just
rebuild it in code?

It's trivial to scale out stateless code; databases are the hard part.

~~~
gnaritas
You're not going to get better performance by tearing apart a tree, sticking
into a database that doesn't store it well, and then taking it back out and
rebuilding it from scratch every time. If you want performance, use a database
that will store your data structure as is and avoid the disassemble/reassemble
step as well as avoiding writing the extra code to do it.

Relational databases didn't win because they're fast, they won because they're
flexible for querying and language neutral, both features which necessarily
slow the database down.

> It's trivial to scale out stateless code; databases are the hard part.

Not really relevant to the conversation. There are databases that can store
trees of objects just fine as is, not all databases force you to store
everything in tables.

~~~
jjs
> Not really relevant to the conversation. There are databases that can store
> trees of objects just fine as is, not all databases force you to store
> everything in tables.

To me, the conversation is about trade-offs, and if the alternative database
(for example) forces you to get rid of ACID transactions, then it's not as
cut-and-dry as you purport.

~~~
gnaritas
Uhhggg... who said anything about getting rid of ACID transactions? Seriously,
what kind of FUD have you been reading. Transactions have absolutely nothing
to do with the format a database uses to store it's data. Most object
databases have ACID transactions AND can store your data-structures in their
native format.

 _Some_ of these new fangled _distributed_ databases forgo transactions
because transactions and distribution don't really go well together, they are
opposing forces. Don't confuse issues that have nothing to do with each other,
and don't assume that only relational databases support ACID compliant
transactions.

Look at a real object database like Gemstone that is directly comparable to
your big iron Sql database. These new distributed key/value pair database are
little more than distributed persistent hash tables, they're different beasts
entirely and aren't directly comparable in features because they're meant to
solve different problems.

~~~
jjs
_Now_ this is getting interesting...

Is there anything open-source (and language-agnostic) that's comparable to
Gemstone?

~~~
gnaritas
Gemstone has a free version limited to 4 gig, but few if any object databases
are language agnostic, that would defeat the purpose. Object systems are
language specific, storing them in their native format is thus generally also
language specific.

There's an open source one called Magma in Squeak that's like a Gemstone lite,
but I'd take the Gemstone version any day because it'll scale to any level
you'll ever need, ever. I'm working on a Gemstone project now, and after
having used it, nothing else I've seen comes close to the massive productivity
it offers.

By the way, databases shouldn't be language agnostic, this leads to the ever
present anti pattern of using the database as an integration point between
many programs which turns the database into a giant ball of mud global
variable that becomes impossible to change.

Integration databases suck. Application should own their own data and
integrate with other application via services. That's how the web became so
successful and that's how big ass enterprises should be ran as well, many
small apps loosely coupled, not one giant global db where every app is bound
to a generic schema that isn't suited for what it actually needs.

------
bayareaguy
This article is confused and misleading. Implementation minutiae such as
tables with lots of columns is not the right way to think about the problem of
selecting a database model or technology. The important questions to ask are:
why do we need this data, where does it come from, who or what will need to
use it, what parts are most important, how long must we keep it and what can
we afford.

------
jhancock
Any recommendations for a solid OODB for ruby? I've had great success years
ago with Java based and Smalltalk based ones. I may be in the market for an OO
backend for a new ruby project.

~~~
gnaritas
Maglev will be one soon, and it's just Gemstone under the hood so it'll be bad
ass.

------
bravura
I like the rules of thumbs about whether you want a key-value store or a graph
database or relational DBs.

------
qslug
Where does LucidDB fall into this scheme?

