

Database heresies (The "right" and "wrong" way to do ORM) - toffer
http://www.b-list.org/weblog/2008/aug/04/orm/

======
njm
Sigh. There is so much misinformation being spread by today's software
engineers without proper CS backgrounds. As anybody who's read Date and
friends knows, relational databases are fundamentally more expressive (and
therefore better) than network databases (which is the model used by object
databases), being grounded in set theory and predicate calculus.

Your bad taste for relational databases is more likely a result of using a
crappy ORM pattern (such as Active Record). Go try a proper implementation of
Data Mapper (I can't recommend SQLAlchemy highly enough), then come back and
tell me how you feel.

~~~
ajross
That sounds dangerously like pedantry to me. The point is not that the
mathematical expressiveness of the network model is as powerful or more
powerful than what you can do with relational stuff. The point is that the
network model maps nearly 1:1 with the actual programming model used to
implement the application logic.

All that relational goodness does you no good if you want to architect your
application using an OO data model in a web application server. Maybe you're
trying to argue that that's the wrong way to do web applications? But if you
accept that people _want_ to use Rails/Django/etc... then you have to also
admit the argument that RDBMS's are the wrong tool for the job. This is true
for the same reason that a Dremel is the wrong tool for framing a house,
despite the greater expressive power of the Dremel's cutting tools.

~~~
njm
I'd actually argue the best approach is to develop both sides fully: I agree
that an OO data model is the best fit for non-trivial application logic, but I
also believe relational databases make the best data stores. The two can
coexist just fine, it's just a matter of selecting the proper design patterns
and supporting libraries -- which is why I mentioned SQLAlchemy. In its
developer's words:

"SQL databases behave less like object collections the more size and
performance start to matter; object collections behave less like tables and
rows the more abstraction starts to matter."

This false dichotomy that's always presented is distracting, and leads to
people naively taking religious positions about what should in fact be a non-
issue.

------
gcv
Rant:

I'm afraid I have to side with the author against the "relational set theory
purists." I actually don't care much about styles of ORMs, and I don't even
care that much about having to write SQL, either. I can do, and have done,
both. For me, both work equally well, or rather equally poorly.

I loathe relational databases because, every time I have used one, it has
eaten a vast amount of time in making the schema and queries fast enough. Yes,
I know what third normal form is, and I start my database designs with it. I
know how and when to denormalize to improve join performance. I know how to
look through a query plan and create necessary indices. I've even looked
through low-level IO statistics to figure out where a slow query wastes time
doing unnecessary IO. Guess what: that work sucks. I've used Sybase,
considered a "real" RDBMS, and I used MySQL, considered a "crappy" RDBMS. Both
have sucked about equally, just in slightly different ways.

I am an application and system-level programmer. I want to be able to tell my
app: "store this data for future retrieval." I want the store to be
instantaneous. I want the future retrieval to be instantaneous. Period. As a
major bonus, I want to be able to distribute the store between multiple
machines, but do so as transparently to my application as possible. I also
don't want to risk losing my data when one server loses its disks, so I want
at least semi-transparent replication. ACID semantics are, obviously, a big
plus. No RDBMS gives me this flexibility in designing an app. Oracle RAC,
supposedly the cat's meow of scalable RDBMS, has RW database load balancing,
but it still relies on having a single centralized SAN store on its back end.

To add insult to injury, the promise of arbitrary queries against the data
just doesn't pan out. Yes, I can run an arbitrary query. Overnight. Adding
this query to an application with a non-trivial schema and a large dataset
usually requires so much indexing and query optimization work, that I might as
well write the equivalent retrieval code for a Berkeley DB store by hand. I
hate going through my queries and using trial-and-error to figure out where a
subselect will outperform a join, and I can't afford to hire a bunch of DBAs,
who also happen to know set and relational theory, to do this work for me.

~~~
vegai
It seems the Prevayler or HAppS style of data storage & retrieval is what you
and I want (and perhaps in their hearts, every other non-db-specialist as
well)

Basically, you just define data structures in your language like you would any
other piece of data, and it is silently in the background serialized into an
efficient (I would hope) ACID datastore.

<http://www.prevayler.org>

<http://happs.org>

------
ajross
FTA:

 _It turned out that relational databases were in the right place at the right
time, and the “good enough” implementations and uses took over the world._

Yeah, that's what I've thought for a long time too. It's nice to see that in
writing.

But I guess the rest of the essay leaves me a little cold. If the relational
model is fundamentally flawed (rather: fundamentally mismatched to the
overwhelmingly popular application design paradigm) then isn't the solution
just to chuck the SQL database in the trash and start over?

So what has us tied to SQL and afraid of using the filesystem for storage
directly? What are the _other_ things we get from a database that we want? As
far as I can see, the real need is for atomic transactions and consistent
backups (or replication, which is the flip side of those features). No one
cares about the "relational" stuff at all, really, as proven by all the awful
ORMs out there.

And yet, if that were true, we'd all be happily using Berkeley DB. So what
_else_ am I missing? Some part of me thinks that there's a great startup idea
in this space, but I don't quite see it yet. Tools like Google's App Engine
seem to be attacking part of the problem by at least tweaking around with the
SQL assumptions, but they're still fundamentally relational at their core (and
they're still services, and my gut tells me that most of the world will never
trust their data to someone else's service, no matter how well-justified it
might seem).

~~~
stcredzero
A big part of the reason for Relational DB's continued supremacy, is that it's
become a part of the back-end language of business IT. Even though an OODB
might be a better fit for an application, it's a handicap that your app is not
on top of a relational database. SQL has become a kind of protocol for data
exchange. Unfortunately, it makes for a very loose and sloppy protocol that
completely spoils encapsulation.

~~~
ajross
Sure. I'm not expecting something new to displace SQL at a Fortune 500 IT
shop. But the question is why are otherwise-groundbreaking technologies like
Rails still wedded to the ORM stuff, when clearly there isn't any conservatism
at work there. Anyone willing to bet the company on some weird language from
Japan isn't going to blink about changing their datastore.

~~~
jimgreer
Actually one of the most interesting things at RailsConf was MagLev, which
applies the GemStone object persistence engine to Ruby. Apparently there are
some pretty big Smalltalk apps out there using GemStone.
<http://chadfowler.com/2008/6/5/maglev> and <http://ruby.gemstone.com/>

~~~
stcredzero
One of the interesting technologies that Gemstone has had in its portfolio for
years is the ability to make its data available in relational/SQL DB form. You
can have your app running on top of an OODB, but still interface with your
Fortune 500 comrades who want to do SQL queries for Crystal Reports.

