Object-Oriented Database Management Systems Succeeded

trjordan · on April 7, 2009

Can somebody explain to me the difference between using an OODB to back your web app and using a web framework like Rails or Django?I know the actual implementation is different, but the abstraction presented seems to be the same...

jrockway · on April 7, 2009

There isn't much difference if you only work with single elements and sets of single elements. (Imagine, a user with a username and password; and the set of all users.)

Things become more complex when you want to store arbitrary data structures. A simple example is a doubly-linked list. This is easy to represent in memory, and it's easy to represent in a relational database, but it's not particularly easy to translate between the two representations. With an object database, the in-memory structure is the persistent structure, so there is no translation. Anything you have in memory can be put in the database. (Most implementations let you index and query this data, as well, and provide you with opportunities to create indexes that would be difficult or impossible to represent with a relational database management system.)

Also, a key feature of OOP is polymorphism, which the relational model does not handle elegantly. Object databases handle it fine.

kingkongrevenge · on April 7, 2009

We use object store for exactly the kind of thing he argues it should be used for (slightly analogous to CAD). I think it sucks. A normalized relational schema would be much better. Instead we suffer endless invalid pointer crashes and inconsistent data all over the place. Changing the schema is also a nightmare.

OODBs are a seductive hack. They can look like a great idea because initial development is simple. You get to skip the laborious step of developing a proper relational schema and setting up mechanisms to ensure integrity. Two years later your data is an inflexible mess with subtle integrity errors here and there.

jrockway · on April 7, 2009

Two years later your data is an inflexible mess with subtle integrity errors here and there.

Well yeah, if you do it wrong, it doesn't work.

You get to skip the laborious step of developing a proper relational schema and setting up mechanisms to ensure integrity.

Basically, every program needs to ensure consistency of its in-memory structures. If you have invalid data in memory, you can't trust your program to return correct results. With an OODB, you reuse these integrity checks for your data model. With a relational database, you have to specify your constraints a second time.

Twice as much code == twice as many bugs.

kingkongrevenge · on April 7, 2009

Tons of first hand experience and expert research says that relational normalization is the best way to avoid mistakes.

> If your in-memory data doesn't make sense, then who cares what's stored?

You are going to screw up your in memory data structures in a complex application. It WILL happen. A normalized relational schema will catch the mistake.

You can go 3NF, with decades of experience and theory. Or you can wave your hands and say "don't make mistakes" in your hierarchical/oodb system.

If you need simple persistence for an app that will be dead inside a few years, go knock yourself out with an OODB. It will work fine. If you're doing something complex, like say the equivalent relational DB would have a few dozen tables, I think it's foolish. You really probably aren't smarter than the collective experience that has successfully settled on the relational model.

jrockway · on April 7, 2009

You really probably aren't smarter than the collective experience that has successfully settled on the relational model in recent decades.

The same collective that thinks Java is the right thing for teaching, research, and industry?

</strawman>

But seriously, the industry at large is usually about 40 years behind the state of the art. Look at all the features that your programming language of choice doesn't have, even though we've known them to be good ideas for 40+ years.

Finally, data is not some sacred thing that is set in stone forever. If you need a different schema for a different application, then translate it. When I worked in data warehousing for a large advertising company, we had to translate the data that was collected from the ad servers and translate it to a variety of forms; one for archival analysis with specialized tools, and another for showing the data in the web interface. The schema that could cope with tracking millions of hits a minute did not do well for the off-the-shelf analytics tools or the web UI. (We used relational databases for all three systems, but we didn't have to. The only thing that had to be a relational database was the off-the-shelf software.)

silentbicycle · on April 7, 2009

For what its worth, "What language is the the right thing for teaching, research, and industry" is a complex (and, ultimately, pretty ill-defined) question, but "What techniques can be used to organize information whose internal consistency must be mathematically verifiable?" is not.

arohner · on April 7, 2009

Naive question: what mathematically verifiable properties do current SQL DBs have that current OODBs cannot?

Also remember that strictly speaking, the following things are not the same: modern SQL DBs, all RDBMSes, and pure relational theory. I can easily imagine an RDBMS that does not use a SQL interface, and modern DBs are not the same as the Codd's pure conceptual work.

silentbicycle · on April 7, 2009

To the first question, I don't know which properties are verifiably impossible to have in an OODB, but would be quite interested. I think one would first need a more thorough definition of what actually constitutes an OODB (and OO, for that matter.).

To the latter: Yes, of course. Tutorial D and all that.

Personally, I'm not doing anything at the moment for which Twitter-esque scalability is really a requirement (though hardly anyone is, honestly). Postgresql is a sufficient large RDBMS, for my purposes, SQLite works nicely as a smaller one, and below that, in-memory Lua tables, Python dicts, or Scheme sexps are fine. I'm mostly curious about what alternative database-like tools could be made for roughly the same domain as SQLite. To some extent, that could overlap with objects, but OOP is not in any way a requirement for me. (There may be some intersection with filesystems, too; they're databases, after all.)

jules · on April 7, 2009

In my experience, keeping objects consistent is easier than keeping the database consistent. The database can only do trivial consistency checks. With objects you can write arbitrary predicates that check things about your objects. In the applications I write that means that even though I use a relational database, I end up doing all important checks on the objects, and none in the DB.

neilc · on April 7, 2009

The database can only do trivial consistency checks.

Between foreign keys, triggers, and CHECK constraints, databases can actually do pretty sophisticated consistency checks. There's also a SQL standard feature for database-wide assertions that would be very useful, but AFAIK no one implements it (probably for performance reasons).

gnaritas · on April 8, 2009

Actually, no, it can't, unless you code all your business rules into the db as well. The advantage the objects have is that they're also the application and all the rules are there and checkable.

neilc · on April 8, 2009

Actually, no, it can't, unless you code all your business rules into the db as well.

Well, you can code all your business rules into the DB, in the form of stored procedures. But even if you choose not to, I don't see why that fact prevents you from enforcing a wide range of consistency constraints in the database.

gnaritas · on April 8, 2009

I didn't say you can't do any checks, rather, the db does mostly simple checks. The complex checks often depend heavily upon the object model and use of features like polymorphism which relational databases suck at modeling.

neilc · on April 8, 2009

Complex business logic constraints depend on "the object model and polymorphism"? Can you give an example?

gnaritas · on April 9, 2009

Objects can validate themselves and object databases store the class as well as the instances. This allows a database to contain a mix of different versions of a particular class of objects. The invoice from yesterday might have 5 fields and validate under one set of rules while todays invoice has 7 fields and validates itself under a completely different set of rules.

Such polymorphism is trivial to implement with objects and very painful to implement withing a relational database. It's not a matter of whether a RDBMS can model something, it's a matter of how much effort is required.

silentbicycle · on April 8, 2009

> I end up doing all important checks on the objects, and none in the DB.

Databases, particularly relational databases, have a LOT of functionality specifically for enforcing constraints and consistency checks on data. You will almost certainly need to organize your schema differently, but you can delegate a lot of that checking onto the database engine, and it will probably speed everything up -- you'll be passing quite a bit less data back and forth just to verify it, and databases are generally optimized heavily for that sort of checking. (Databases are also great at dealing with data as collections, unsurprisingly.)

All bets are off if you're using a really crappy RDBMS, though. (Postgresql is quite good, in my experience.)

jules · on April 7, 2009

Do you think this is a feature of the particular OODB you're using, or of OODBs in general? Do you think your problems can be solved by changing the OODB (adding a static type system, for example).

kingkongrevenge · on April 7, 2009

Of course the commercial oodbs are statically typed. I'm not sure what you're getting at.

OODBs are fundamentally flawed in the way you have to manage pointers between objects. You also inevitably wind up with copied data.

jrockway · on April 7, 2009

As I state in another thread, you are thinking of document databases, not object databases. If you have two copies of the same piece of data, you're misusing the database.

I'm also not sure what problem pointers are. They work fine for in-memory data, why would they not work for persistent data?

kingkongrevenge · on April 7, 2009

Complex in memory object graphs do not work fine. Programmers inevitably screw up. In complex systems you also inevitably get denormalized copies of objects in an OODB.

You're not saying anything that's incorrect. It's true, anything you can do with a relational db you can theoretically do with an OODB. In many cases you likely will finish with less code. However, this theoretical fact flies in the face of a lot of experience that indicates the superiority of the relational model.

jrockway · on April 7, 2009

The relational model doesn't prevent programmer fuckups, nor does it save them from many common errors. For example:

    sub add_one_to_each_row {
        my $row = shift;
        $row->some_relevant_column( $row->some_relevant_column - 1 );
        $row->update;
    }

Although the data is fully relational, your program corrupted it anyway.

(As a slightly-related aside, I really need to write an essay about how type systems don't prevent bugs when you use overly-broad types like "Integer".)

kingkongrevenge · on April 7, 2009

That's not a relevant example.

jrockway · on April 8, 2009

The point is, at some point your application can touch the data, and a bug in the application can easily ruin it... regardless of the underlying database topology.

Additionally, it's easier to create rich type constraints in object systems than it is in SQL.

(Note: Java and C++ do not count as OO languages. They are missing too many features.)

jules · on April 7, 2009

I was thinking about a type system that forces all pointers to be non-null, with a Maybe/Option type for optional data.

TweedHeads · on April 7, 2009

Objects should be an abstraction layer between atomic data and the user, but data should always be stored atomically in a relational model.

silentbicycle · on April 7, 2009

The problem is that, in practice, these often work against each other. OO-style abstractions and relational schema design can pull developers in very different directions. See, for instance: http://c2.com/cgi/wiki?ObjectRelationalImpedanceMismatch

Also, I suspect the set of programmers that understand how to do OO design well AND have a solid understanding of relational databases is relatively small.

(edited to flesh out a bit)

TweedHeads · on April 7, 2009

I see your point. OO programmers who don't understand the relational model want to 'force' the database to deal with objects and there the impedance mismatch.

In that case Relational Objects Database might be a better model.

Let the database store the whole object as an entity, an as the system evolves it can be 'normalized' to more atomic sub-objects giving the programmer time to grasp the relational model.

For simple systems like a contact list, a contact object would suffice, even if it has many phones or many addresses. Get one contact, put one contact. The more complex it evolves the need to change the schema to make it easier to maintain and to avoid redundancy while keeping data integrity.

In the end, a complex system would deconstruct complex objects into simple atomic objects (data) related to each other with simple rules.

TweedHeads · on April 7, 2009

Not necessarily, but I get your point.

When I say Invoice.Select(1234) I want the data engine to query 20 nicely normalized tables and bring me back one object with the invoice I am looking for. The same when I say Invoice.Update()

Most DBAs are very jealous of their territory and rarely collaborate with programmers in how to make that process transparent to both of them.

jrockway · on April 7, 2009

Why? So you can write more code, and hack around the O/R impedance mismatch?

If objects are what your application code interacts with, why would you want to translate them to an unrelated form just to store them on disk?

TweedHeads · on April 7, 2009

Do you know how many entities form an invoice?

Do you know what happens if one those entities change, like a phone number, a vendor's email or a product packaging?

Instead of updating one entity, you will need to update every single object that contains the wrong data.

How do you maintain integrity if redundancy is part of your model?

jrockway · on April 7, 2009

Instead of updating one entity, you will need to update every single object that contains the wrong data.

No, this is not true. Your invoice has a reference to the vendor. If you update that vendor object, when you request the invoice, you'll get the updated vendor object.

Your complaint describes problems associated with document databases. Document databases are not object databases. (For example, CouchDB is a document database, not an object database. So if you hate CouchDB, you might not hate object databases.)

TweedHeads · on April 8, 2009

"Your invoice has a reference to the vendor."

Relational model.

Case closed.

jrockway · on April 8, 2009

You do know that the "relational" model refers to relations, not relationships, right?

slpsys · on April 7, 2009

Objects should definitely just be an abstraction of the data, but I'm not sure where your specific requirement for a relational model comes from.

TweedHeads · on April 7, 2009

Codd did his job very well when he set the model 40 years ago based on set theory and predicate logic. It has to do with atomicity, redundancy, validity and integrity of data.

Any object you can imagine can be decomposed in atomic parts that can be stored individually with their respective relations to each other.

jrockway · on April 7, 2009

Any object you can imagine can be decomposed in atomic parts that can be stored individually with their respective relations to each other.

Uh, yes? What makes you think that this isn't how an OODB works? Generally, each object is stored as a graph of its dependencies. (A class has attributes, and an instance of a class is a collection of instances representing these attributes. This is what's stored in the database... and in memory, for that matter.)

Codd did his job very well when he set the model 40 years ago based on set theory and predicate logic. It has to do with atomicity, redundancy, validity and integrity of data.

Care to go into more detail here? Nothing you say has much to do with object databases -- they can be fully ACID, they store objects that are the same in memory as the same object in the database, and they don't magically make a consistent memory image inconsistent. As long as your in-memory data makes sense, the data in your object database will make sense.

If your in-memory data doesn't make sense, then who cares what's stored? You corrupted your data a long time ago, and no set theory is going to fix that.