
OrmHate - jgrodziski
http://martinfowler.com/bliki/OrmHate.html
======
ef4
I think a lot of the frustration with relational databases in general comes
from a misunderstanding.

People conflate "relational" with "SQL", because of the historical accident
that SQL is the most popular way to query relational data. Then when SQL isn't
a good fit for their problem, they think relational is not a good fit for
their problem, which is almost certainly not true.

The original motivation for relational databases is to have _path-independent_
access to data. This is a really powerful idea.

What I think we really need are better APIs around relational data access that
embrace real relational semantics, instead of dumping them in favor of an
object graph at the first opportunity (which throws away the path-independent
access).

~~~
xi
_People conflate "relational" with "SQL", because of the historical accident
that SQL is the most popular way to query relational data. Then when SQL isn't
a good fit for their problem, they think relational is not a good fit for
their problem, which is almost certainly not true._

For most practical purposes, SQL is the only way to query relational data. In
the absence of alternatives, it's natural to conflate the notions of SQL
databases and relational databases. I agree that SQL is a mess, but I don't
think an approach based on pure relational primitives would make it better; in
fact, I think SQL is a mess specifically because of lack of expressiveness in
the pure relational primitives. NULL, ORDER BY, LIMIT/OFFSET, opaque keys,
windowing functions, transitive closures, etc fit poorly into the relational
model.

 _The original motivation for relational databases is to have path-independent
access to data. This is a really powerful idea._

I agree with both assertions, but in many applications, path-based access
makes the total majority of queries, and SQL or relational model provides
little means to distinguish them from other equi-joins or arbitrary join
conditions. In my opinion, it would be better to start with a navigational
path-based database model, and extend it to allow constructing new paths
dynamically.

~~~
cico71
I'm not sure what you mean by fit poorly in the relational model.

ORDER BY, LIMIT/OFFSET have to do with presentation of data. So, although it's
highly desired that a language based on the relational model supports them,
they have nothing to do with the model per se.

As for opaque keys, I thought the old debate about surrogate/artificial and
natural keys was over years ago. The relational model has nothing to say about
it, that would be like trying to make a model understand if facts are true in
the real world or not. A key is a key.

Moreover, last time I've checked transitive closure operators were defined for
the relation model (people should not stop ad Codd papers, Date and Darwen
wrote a lot of books, e.g. The Third Manifesto, extending on the original
ideas).

NULL is a completely different beast and this is the only real thing one can
consider problematic.

In some cases NULL just means the predicate for the relation is different
because values for an attribute don't apply, so this is not really a problem,
in some other cases we simply don't know the value when we are collecting
information, and this is indeed a problem.

Date follows Wittgestein that said we should remain silent about things we
can't speak about i.e. we shouldn't collect incomplete information, Codd came
up with I-marks, A-marks and n-valued logic, SQL collapsed everything into
NULLs and 3-valued logic.

~~~
xi
_ORDER BY, LIMIT/OFFSET have to do with presentation of data. So, although
it's highly desired that a language based on the relational model supports
them, they have nothing to do with the model per se._

By fit poorly I mean that you cannot express most real-world business
inquiries using pure relational primitives without ORDER BY or LIMIT/OFFSET
and that's why I think relational algebra is not usable per se. SQL fixed this
problem by adding many non-relational constructs, but but without any sense of
consistency or direction.

I also strongly disagree that ORDER BY and LIMIT/OFFSET are presentational
operations since I often use them not only for wrapping the outer SELECT, but
also within correlated subqueries.

To show some proof, here are a few queries which are hard or impossible to
express in relational algebra:

1\. Show the blog post with the largest number of comments [^].

2\. Show the tags associated with the blog post with the largest number of
comments.

3\. For each blog category, show the 3 top blog posts by the number of
comments.

[^] If more than one exist, pick the latest.

 _NULL is a completely different beast and this is the only real thing one can
consider problematic._

I think NULL is only hard because relational model is a wrong way to look at
the data. If you see an entity attribute not as a column of a tuple, but as a
function from an entity set to some value domain, the fact that the attribute
is nullable just means that the function is not total. There is a well
developed mathematical apparatus for partial functions, in which NULL becomes
a _bottom_ value injected to the value domain, and tri-valued logic is simply
a monotonic extension of regular Boolean operators.

~~~
MattRogish
As long as you can represent your query with predicate logic and not violate
set theory (or other relational tenants), it's perfectly "relational".

It's important to note that folks have figured out how to extend Codd's
original relational algebra with stuff like aggregation. As the OP mentioned,
"Relational" doesn't mean "What Codd wrote in a single paper back in 1969". It
has continued to evolve, both with Codd's direct involvement and from
successors like Date, Darwin, and Pascal. Codd wasn't an all-seeing, all-
knowing data-management demi-god - but his general theory of relational
database management and the core tenants are still super awesome. Extensions
to it, as long as they don't violate the RM, are just as valid as Codd's
original work.

That means #1 is totally relational (as an aside, you don't need ORDER BY or
LIMIT for it either). Indeed, relational algebra supports aggregations
(<http://en.wikipedia.org/wiki/Relational_algebra#Aggregation>).

It's important to separate the query language from core RDBMS theory, as the
two are orthogonal. Codd suggested Relational Algebra as a reference language
but never intended for it to be the only way to communicate with a RDBMS.

See CJ Date's excellent discussion on ORDER(BY):
[http://books.google.com/books?id=WuZGD5tBfMwC&lpg=PA163&...](http://books.google.com/books?id=WuZGD5tBfMwC&lpg=PA163&ots=WeqU4pFMMA&pg=PA163#v=onepage&q&f=false)

EDIT: It even seems folks have figured out how to make "LIMIT" relational
operator: [http://stackoverflow.com/questions/10229535/relational-
algeb...](http://stackoverflow.com/questions/10229535/relational-algebra-for-
limit-operator)

I have not read the paper, so I cannot discuss the validity of the approach.

~~~
xi
_That means #1 is totally relational (as an aside, you don't need ORDER BY or
LIMIT for it either)._

I would love to see it. Yes, you can do it in SQL, but I'd say it's not easy
at all without ORDER BY and LIMIT or windowing functions and I don't know if
you can do it in Tutorial D. For the reference, #1 is:

 _Show the blog post with the largest number of comments. If more than one
exist, pick the latest._

The schema is:

    
    
        post(id integer, created timestamp)
        comment(id integer, post_id integer)
    

_See CJ Date's excellent discussion on ORDER(BY)_

I read it and the book as well, but I wouldn't call it excellent. What I read
there is a reluctant admission of failure to incorporate an important
operation to his query model. I see no attempt to analyze why it doesn't work
or adapt the query model to make ORDER a regular operation.

~~~
MattRogish
It's been a loooong time since I wrote any longhand Relational Algebra, so
I'll cheat and use SQL. All of this can be done pretty easily with relational
algebra primitives.

And we can make up any operator we want as long as it uses a primitive, so (to
save typing I created a view, but you could copy-pasta). I added a "Title" to
post because otherwise you could skip it entirely and just use the comment
table twice, but where's the fun in that?

<pre> CREATE VIEW counts AS ( select count( _) AS comment_count, post_id from
comment group by post_id )

SELECT id, title, comment_count FROM post p INNER JOIN counts AS c1 ON p.id =
c1.post_id WHERE NOT EXISTS( SELECT _ FROM counts AS c2 WHERE c2.post_id !=
p.id AND c2.comment_count > c1.comment_count ) </pre>

~~~
xi
Thank you. I accept your answer with the note that you ignored my request to
return only the latest post when there are more then one posts with the same
number of comments, but it's not hard to adapt you query to satisfy this
requirement.

However you can't do the same trick if I ask you to return the top _3 posts_
with the largest number of comments; or, to make the query more realistic, ask
you to return the percentage of comments generated by the top 10% popular (by
the number of comments) posts. Which is my point: pure relational algebra as
advocated by Date et al in Tutorial D is less expressive than SQL, which
probably explains the cold reception it got from the industry.

Edit: now that I think about it, you could do it without ORDER BY/LIMIT, but
still it's harder than necessary.

~~~
cico71
I was about to prepare the query without ties and yes it is a lot harder than
necessary and, more important IMO, much less readable/intuitive for people
than have to maintain the code and that is exactly why different SQL dialects
introduced ORDER BY/LIMIT/OFFSET/TOP/RANK etc.

But let me be a little bit picky about this and Date's view on ORDER BY.

The relational model deals just with the algebra/calculus without getting into
the details of a language based on the model.

In the book pointed out by Matt, Date explicitly states he's not saying ORDER
BY is not useful, just that it doesn't return a relation and thus it's not
included in the algebra.

However a language based on the relational model, like Tutorial D, can include
such an operator. To be double sure, I checked on The Third Manifesto V2 and a
LOAD operator with an ORDER specification is defined in the context of the
"special cased" support for arrays.

You can see a couple of paragraphs if you go here
[http://books.google.it/books?id=X85QAAAAMAAJ&dq=editions...](http://books.google.it/books?id=X85QAAAAMAAJ&dq=editions:IdjRpJFJmtQC)
and search for ORDER and LOAD (page 118).

TBH I'm not even sure it's a good idea to introduce arrays for ordering, but,
anyway, back to the quota queries: we agree that regular aggregation operators
are enough although the query becomes very complex.

In the same book referenced by Matt there's an exercise (7.14) showing how to
do a quota query and you can see that even in Tutorial D it's complex.

However, in the solution, Date & Darwen also propose something else: to add a
more specific RANK operator which is really just syntactic sugar to simplify
this kind of queries. With the important difference, compared to ORDER, that
it still returns a relation and not an ordered sequence of tuples.

Unfortunately the whole solution to the excercise is not available through
Google Books preview, and the operator is formally defined elsewhere, but you
can see how such RANK operator would work here
[http://books.google.it/books?id=WuZGD5tBfMwC&lpg=PA163&#...</a><p>Just a
final comment about Date being reluctant to analyze the matter, unfortunately
his work is disseminated in a lot of books (and he changed his position on
quite several matters throughout the years).<p>I love the "SQL and Relational
Theory" one but, having red all of his books, I would be hesitant to suggest
it unless one already knows Date. I think the latest edition (8th) of "An
Introduction to Database Systems" is still the best book to start with.

~~~
xi
_I was about to prepare the query without ties and yes it is a lot harder than
necessary and, more important IMO, much less readable/intuitive for people
than have to maintain the code and that is exactly why different SQL dialects
introduced ORDER BY/LIMIT/OFFSET/TOP/RANK etc._

Very good point.

 _In the book pointed out by Matt, Date explicitly states he's not saying
ORDER BY is not useful, just that it doesn't return a relation and thus it's
not included in the algebra._

My biggest gripe about ORDER BY, LIMIT and relational model is the fact that
while Date and others made some attempts to express these operations in terms
of relational algebra, they never (AFAIK) tried to do the opposite: alter the
relational query model to naturally support them. It's not hard: just replace
sets with sequences or arrays. It will gives you natural ORDER and SLICE
operators as well as new aggregates FIRST, LAST, NTH. It solves duplicates
without having to introduce bags, gives windowing functions for free and
probably better represents how modern RDBMS interpret a query. Another hint
why sequences may work better than sets is the fact that regular set
operations such as INTERSECT and UNION (as opposed to UNION ALL, which becomes
concatenation) are so rarely used in real-world queries.

I'm not even arguing that this is a good approach, but I think it deserves
some discussion and it appears they never even thought of a possibility of
changing the model treating it not as an instrument, but as a sacred
scripture.

~~~
einhverfr
First I don't know why order by, limit, offset, or windowing functions, can't
be said to return a relation if we define relations in a way which is
sufficiently useful to include these operations. In other words, they are used
in ways which returns sets of tuples (or sets of entities if you want to see
it that way), based on specific selection criteria.

I would thus agree that to the extent that these are not part of the
relational model this says more about the incompleteness of that model than it
does about the operations themselves.

~~~
MattRogish
Relations being defined what they are is not accidental; sets are not bags and
there are a lot of very good things that come out of a relation having a very
well defined, uh, definition.

It's not a matter of "usefulness" but of "well-defined" that allows us to
derive a whole lot of other interesting things.

Folks actually _have_ defined ORDER BY, LIMIT, OFFSET, etc. in terms of the
RM; it's just that the typical ORDER BY doesn't return a relation because of
ordering (sets are unordered by definition) and so there was a lot of
gymnastics they had to do in order to keep the set theory intact.

Sure, arbitrarily reordering is not a hard concept (or even implementation)
but to make sure you cover all the bases requires a significant amount of
work. A RDBMS is a complicated thing and you don't want to just add something
to it without doing proper due diligence.

One could argue that NULLs are more "useful" (I disagree) but the addition of
NULLs (a deceptively simple concept) has vastly overcomplicated SQL and lead
to a number of inconsistencies in the spec.

~~~
einhverfr
Sets may be unordered by definition but that doesn't mean you can't define
something interesting as an ordered set.

Consider the Pythagorean attempt to prove that all numbers were rational by
trying to prove that the square root of two was rational. That they were able
to prove that it was not rational meant that we ended up with a new category
of numbers. Similarly once you get into the square root of -1 you get into yet
another category of numbers designed to address that problem.

Our numeric model isn't complete with just rational numbers, or just rational
and irrational numbers. Today we have to add imaginary and complex numbers as
well. Why shouldn't we be expanding relational math in the same way?

------
RyanMcGreal
> There is a lot of truth to these charges, but such charges miss a vital
> piece of context. The object/relational mapping problem is hard.

I suspect that context is precisely what underlies the common critiques of
ORMs. Those people who best understand the inherent object-relational
impedance mismatch tend to be the very people who conclude that the effort
isn't worth it.

In my admittedly anecdotal experience, I have found that ORMs are the most
useful for the most trivial queries. For anything complicated, I find it
easier to drop into SQL and write the query directly than to compose an
equivalent object using the ORM - and that becomes increasingly so as the
query gets more complicated.

~~~
debacle
> Those people who best understand the inherent object-relational impedance
> mismatch tend to be the very people who conclude that the effort isn't worth
> it.

You're showing a severe bias.

> I have found that ORMs are the most useful for the most trivial queries.

That's really what ORMs are for. Three joins, max. Any more than that and you
really can't trust the ORM. It's a computer, after all, not a DBA.

~~~
RyanMcGreal
In other words, ORMs are at their most useful when solving the least
challenging problems. To which we must ask: why bother?

~~~
debacle
I'm going to make an argument here, and you might disagree but that's fine:

If you don't understand the purpose of an ORM, you haven't worked on a
sufficiently large enough software project or don't fully understand what the
definition of an ORM is.

If you need to write a report of any fashion, do not use an ORM. You can help
write the report with your ORM of choice, but at some point you need to get
down to the language that the database speaks because you're doing the kind of
set mathematics that ORMs aren't made for.

BUT, if you need to add a user permission, and you're doing anything other
than:

$user->addPermission( Permissions->get( 'CAN_EAT_CAKE' ) )

you're doing something wrong.

~~~
Philadelphia
But that doesn't require or imply an ORM. That's just a matter of having an
object model.

~~~
stevenwei
If your object model at some point gets persisted into a database, then you're
going to need to write code to (load from/save to) the database. At which
point, if it walks like a duck, and quacks like a duck...

------
columbo

        def user = User.findById(25)
        return user.getLocation().getAddress().getState() 
      

What happened during that execution? Was that 1 query or 50? Would it have
been better to write a specific sql statement?

To me, this describes the biggest challenges with using or not using ORMs. I
don't think there is an easy answer.

~~~
phillmv
>I don't think there is an easy answer.

Uhm, ActiveRecord at least will print the SQL it generated. If you think it's
inefficient, you can just… write your own sql that maps out to that query.
With Arel/scopes, they're also composable and lazy loaded. What's not to like?

~~~
judofyr
I find that the opposite way is way harder to debug: I see that a controller
causes a bunch of SQL queries, but where do they come from? In the view? In a
helper? In another library within the app?

~~~
gnaritas
Active record tells you where it came from in the log.

------
mindcrime
Because people are scared of things they don't understand? Or because they had
one bad experience, years ago, with some home-grown crappy ORM, and now they
assume all ORMs are bad? The reasons are legion...

The second of those is what I saw at my last job. The tech-lead / architect
guy was vehemently anti-ORM... because a previous group (at the same company)
had rolled their own ORM in Jython and embedded it into the Java codebase, and
it A. didn't perform well and B. was extremely fragile and hard to modify.
Now, you can't explain to him that there are well written, productive,
performant ORMs like Hibernate. He just doesn't want to hear it. _sigh_

~~~
jonstjohn
Or maybe people expect to interact exclusively through the ORM and when they
have to do something more complex, conclude that the ORM is useless. When
using an ORM increases the complexity of the code substantially over direct
SQL, I'll usually switch to more comprehensible SQL.

~~~
mindcrime
Well, at least in the case of Hibernate, it's easy enough to drop down to
native SQL when you need it (although you obviously risk giving up database
independence) and you can still keep the OOP "feel" by defining query post
converters and what-not.

The points made above about "don't use ORM for reports" aside (a point I
strongly agree with, btw), I've found Hibernate makes my life a _lot_ simpler
and rarely gets in the way or fails to do what I need it to do. But I do
occasionally create a native query to deal with some edge case or whatever.

------
einhverfr
I think the hate comes from two sides:

1) Relational folks who feel that ORMs don't really enable good relational
design, and

2) OO folks who feel that ORM's cause a lot of headache regarding relational
to operation data structure conversion.

The folks in camp 1 are usually absolutely right. Most ORMs encourage the
developer to design the database around the ORM which has to be an antipattern
when it comes to relational systems. This isn't always the case. DBIx::Class
has done some awesome work on some things like composite primary keys and the
like. But generally in most cases it is true.

On the OO side I am less convinced of the costs. So my complaints come from
the relational side. Yes object to operational data structure mapping is
_hard_ to automate, and it's even harder to automate well. So maybe we
shouldn't try. Maybe ORM's should be disregarded in favor of ways of designing
the data so that abstractions occur at different points.

In the LedgerSMB project we decided to use stored procedures to accomplish
this abstraction. We don't put "all the logic" in the database, far from it.
But database operations go through the stored procedures, and these are often
well tuned for performance (or as well tuned as they can be). The stored
procedures thus create an API for addressing relational data, and this allows
an abstraction layer to be formed which maps these into object methods. We'll
probably extend this further in the future to tie object properties to complex
database types, but we haven't decided the best way to do this yet.

ORMs get a lot of hate because they tackle a very difficult problem and
consequently tend to produce results that are unsatisfactory particularly to
db folks. They are the product of app designers looking at the db, more often
than not, rather than db designers looking at the app.

~~~
sjwright
> In the LedgerSMB project we decided to use stored procedures to accomplish
> this abstraction.

Why? Stored procedures are nothing more than a bolted-on way to store SQL
queries on the database server. Yes, you've create an "API" and an
"abstraction" but you've done nothing you couldn't have done in any other
language.

> The stored procedures thus create an API

I consider stored procedures to be little more than a shitty "API builder"
thrown in for free with your database.

* The programming language generally sucks

* Passing in complex data always sucks

* Source control is non-existent (or sucks)

You've probably had to write an API layer within your code anyway, to deal
with the remaining logic requirements, why not store those queries within that
layer? Or put them in a further API layer within your own source code?

~~~
saurik
With PostgreSQL the programming language can be almost anything you want (I
have some stored procedures written in Python), the data structures can be
reasonably arbitrary (you can build your own data types easily, the better
database libraryes already support arrays and dictionaries, and in 9.2 JSON is
being shipped by default), and the source control problem is easily solved by
having pg_dump output your stored procedures to a file.

The key advantages are then that 1) if you need to take the output of one
query and use it in another--especially if they need to be in the same
transaction--you can bring the round-trip time down to nothing and 2) this
part of your abstraction can now be modified as part of a transaction, making
changes to table schemas (which PostgreSQL supports as parts of transactions)
something to do painlessly and without downtime on production.

------
mattmanser
I personally haven't heard/seen anyone griping about ORMs for a couple of
years now. I thought the debate was over. Much like you rarely see anyone talk
about stored procedures these days.

~~~
GFischer
Please do come and have a talk with my (pointy-haired style) boss :) .

I take it you think stored procedures are generally good (at least that's what
I think). But some people still believe that they'll migrate database
architecture or some such. Also, some ORM's aren't fit for the task and give a
bad name (Microsoft's Entity Framework v1 was especially horrible)

I'm definitely out of touch and I spend way too much time working with
Microsoft SQL, but do people treat relational databases as "a crazy aunt" as
Fowler says? (I still haven't tried NoSQL)

~~~
mattmanser
Re: SPs, no, I meant people used to say all your data access should be through
SPs, which seemed to turn about 5-7 years ago as the general consensus moved
to parametrized queries and freedom from DBAs overseeing your changes. I
haven't seen anyone say that in a long time now. I used to use them for
complicated TSQL and helper functions, like some crazy recursive tree building
stuff, but these days I haven't actually written one now in a couple of years.

On a more individual level I'm also now finding myself even shying away from
complicated SQL queries. I'm finding these days they're unnecessary 90% of the
time. Often it's actually faster and more maintainable to pull out a larger
amount of data and then do the more specific calculations in code. Even with
good indexes, etc.

~~~
GFischer
Heh, I'm probably 15 years behind then :) . Thank you for the reply :)

I strongly dislike writing "strings with sql" inside my code, though I do use
parametrized queries these days.

I do appreciate freedom from DBAs as a feature :) but I have access to my
stored procedures these days (though I don't know for how much longer).

I work for an insurance company with an awful legacy database structure, with
mnemotechnic table names like S0001... to S99999, and disregard for naming
conventions or even normalization sometimes. It's not making me a better
developer :( unless learning by bad examples counts.

------
ken
> In the 90's many of us (yes including me) thought that object databases
> would solve the problem by eliminating relations on the disk. We all know
> how that worked out.

Maybe I'm too young (or too old) because I don't know what he is suggesting
here. I've used an object database on a large commercial project, and it was a
dream: everything I like about ORMs, and none of the drawbacks.

My impression of OODBs, therefore, is that they were a technical success, but
a commercial (and open-source) failure. But I don't think that's what Martin
is suggesting. Can someone explain what "we all know" about database history
here?

------
soulclap
I think it's all about getting to know the tool (ORM) you work with. Look at
the generated queries and their performance, use eager/lazy loading in the
right places and mainly take care of the database operations that run most
often. Switching to plain SQL might make sense if you don't need
structured/hierarchical objects at this point in your application and a simple
array/list/table will do just fine. Don't bother with that if your hand-
generated query will most likely end up the same though. Use database-specific
bulk imports or plain SQL if you have lots of data to insert and need it fast.

I am also often seeing comments about ORMs saying "What's so hard about
SELECT..." - nothing's really hard about retrieving data, every decent
developer will sooner or later end up with a working complicated SQL that gets
them exactly what they wants. But I think the real benefits of ORMs start when
you have to save/edit/update/extend relational data again and have to take
care of all the inserts, updates, deletes across several tables, maybe
including versioning or workflow (drafts or changes waiting for approval) and
other additional complexity on top. That's the parts where I am glad to rely
on an ORM supporting relationships and which can be extended via hooks or
behaviours. (Note that I am mainly talking about 'Admin' pages here and
nothing that runs too often.)

------
krig
"You have a relational mapping problem. 'I know', you say. 'I'll use an ORM.'
You now have two problems."

That just about sums up my experience with ORMs. Of course, like all things in
the real world, experiences vary. However, I do think that generally, ORMs
solve none of the difficult relational mapping problems and adds another layer
of abstraction that complicates things like performance tuning to the point of
negating any other benefits they may bring.

~~~
brazzy
Have you actually read TFA? Fowler's point is that yes, ORMs don't solve all
of the really hard mapping problems, but save you a lot of boilerplate on the
_other_ 80-90%, and the hallmark of a _good_ ORM is that it allows itself to
be bypassed with relatively little hassle for those hard problems (like
performance tuning).

Also from TFA: what do you suggest using instead?

~~~
mgkimsal
Come on - how hard is it to write

"select id, username, email, nickname, date_registered, pass_hash, first_name,
last_name, middle_initial, home_phone, mobile_phone, work_phone from user
where username=?" and manually do escaping whenever you want user info?

You seem to be suggesting that

user.findByUsername('joe')

is somehow more worthwhile or usable. It's certainly more noble, because an
ORM doesn't solve 100% of a set of problems 100% of the time. _OBVIOUSLY_ it's
better to never use them, ever. In any circumstance.

"select id, username, email, nickname, date_registered, pass_hash, first_name,
last_name, middle_initial, home_phone, mobile_phone, work_phone from user
where username=?"

is really more readable because I always want to know every single piece of
data, and read it every time I look at code (and update all my SQL queries
every time I make a schema change) all the time. I'm finding that my monitor
is often devoid of code - I have too much space on the screen just begging to
be filled up with useful boilerplate code so I don't ever forget all the
column names in every table.

~~~
zzzeek
copying my answer from virtually the identical commentary made on reddit ("why
should I use an ORM when I can just write the SQL?") (not to mention Fowler's
article makes this same point pretty clearly):

ORMs do not claim to be "simpler than SQL" and this has nothing to do with the
purpose of an ORM. There is nothing that's ultimately "simpler than SQL", if
in fact you aim to just write SQL.

The purpose of the ORM is to marshal data between an object-oriented domain
model and a relational schema, to translate a wide array of common relational
patterns with corresponding domain-model patterns (think foreign key to
collections, for example), and to express relational queries in terms of the
object model at least to some degree.

All of this is well outside the realm of just "writing SQL queries". The ORM
is a tool used to integrate relational databases with an object oriented
application layer, and to that degree they make this task a lot less effort
than doing it by hand.

------
antihero
I've used both Django ORM and SQLAlchemy and they are generally excellent,
allowing you to use SQL for anything that's just a bit too complicated for the
ORM, but greatly simplifying day to day dynamically generated queries.

------
chris_wot
I've always wondered what the mathematical basis behind OOP is. I mean,
relational databases are grounded in first-order logic and set theory. If
there was a mathematical theory that grounds OOP, then I'd love to know
because then there may be a way of finding a mathematical basis for ORM.

Anyone who wants to provide the answer to this one would be my new best friend
:-)

~~~
Ygor
Maybe this can help:

[http://reperiendi.wordpress.com/2007/11/03/category-
theory-f...](http://reperiendi.wordpress.com/2007/11/03/category-theory-for-
the-java-programmer/)

------
compay
The single biggest disadvantage of ORM for me is that in exchange for easier
programability, they force you to learn a proprietary, arbitrary, ad hoc DSL
that is less consistent, more complicated and less powerful than SQL.

~~~
wiradikusuma
It depends on what "proprietary" means. In Java, ORM is pretty much
standardized with JDO/JPA.

~~~
vegas
or HQL. which is pretty much exactly what this guy means.

------
j45
My why is more of a feeling/observation/theory so please feel free to share
your thoughts and feelings:

Some developers hate, and are almost threatened when something fancy and
complex they might do is trivialized, thereby trivializing them.

There is no question that something like ORM makes some things easier, and
other things harder. It applies equally well to SQL and which flavor of
database you end up using, or whether you're using ActiveRecord, etc., or not.

I would hope developers who find something that frees to solve user problems
instead of technology problems would be a good thing, but maybe it's just me.

------
narag
I don't like the _implementation_ of ORMs as run-time layers. I very much
prefer using a tool that generates code. It's orders of magnitude easier to
maintain... for me anyway.

------
duggan
There's some serious miscommunication going on every time this discussion
comes up.

A lot of people seem to conflate _all_ data abstractions with the term "ORM"
(which has a very specific meaning), therefore assuming everyone who thinks
ORMs are a bad idea are arguing against all forms of data abstraction.

This is not the case.

One of the better articles on this:
<http://seldo.com/weblog/2011/08/11/orm_is_an_antipattern>

------
stevenwei
I really have never understood the ORM hate. I've found them to be immensely
useful in 99% of circumstances, and for the remaining 1%, a good ORM will
always let you fall back to raw SQL. Aside from providing a simpler syntax for
performing basic queries, there are a few features that ORMs provide that have
greatly simplified my life:

1\. Automatically using prepared statements and validating/escaping query
arguments to prevent injection. You have to be quite a bit more careful when
you're working with raw SQL.

2\. Providing an clean API to construct complex queries.

This typically becomes an issue when you have a query where you are filtering
and/or sorting by multiple fields which are specified by the user, some of
which are conditional. If you're writing raw SQL you end up needing to do a
lot of string manipulation which can get fairly messy (and makes the code more
difficult to understand). An ORM which provides some sort of query builder
syntax that lets you do:

    
    
      if (some condition):
        query.addWhere(clause)
      if (some other condition):
        query.addWhere(other clause)
      etc.
    

is pretty convenient.

3\. Collating repeated result rows from joins

Whenever you're working with joins you end up with repeated data in your
result set, which you generally end up having to collate before display. For
example if I have _Recipes_ <\- (many to many) -> _Categories_ and do a query
to load the both of them, I might end up with something like this:

    
    
      recipe_id | recipe_name  | recipe_ingredients | category_id | category_name
      --------------------------------------------------------------------------
      1         | Shrimp Pasta | 1 cup tomato sauce | 1           | Pasta
      1         | Shrimp Pasta | 1 cup tomato sauce | 2           | Seafood
      1         | Shrimp Pasta | 1 cup tomato sauce | 3           | Shrimp
      2         | Fruit Cake   | 4 cups flour...    | 4           | Dessert
      2         | Fruit Cake   | 4 cups flour...    | 5           | Cakes
    

Without an ORM I have to loop through the result set to re-format the data the
way I wanted before displaying it. An ORM takes care of that for me and gives
me back 2 _recipes_ with their _categories_ accessible via
_recipe.categories_.

4\. Simplified manipulation of many-to-many relationships. Following the above
example, if I want to add a new _Category_ to a _Recipe_ I can simply do:

    
    
      recipe.categories.add(category)
    

If I want to set the categories to something entirely different, I can do:

    
    
      recipe.categories = [category1, category2, category3, etc]
    

Without the ORM I would have to manually sync up the entries in the join table
which is kind of a pain in the ass. Working with join tables in general is
rather obnoxious, so I'm quite glad that the ORM takes care of that one for
me.

5\. Some ORMs give you notifications when an object (or collection of objects)
changes. This is pretty important on the client side when you want to make
sure the data you're displaying stays up to date, even as it is being
manipulated. For example: if I'm viewing a _recipe_ on my iPad and I update
that _recipe_ on my desktop. A background thread is running which keeps the
two synchronized, and at some point the underlying _recipe_ is updated in the
database on my iPad.

If I'm working with raw SQL there's basically no way to know when that object
is changed (short of polling it periodically, or rolling your own notification
system). But my ORM will keep me notified of changes to the object so I can
refresh the user interface with the updated recipe after the sync completes.

6\. Some ORMs implement a _unit of work_ that allows you to track what changes
have been made to an object since it was retrieved from the database. So you
can easily see which fields have been modified, and then when you go to save
the object back out, it will intelligently only issue the SQL to update the
columns which have changed, or won't even touch the database if nothing has
actually changed.

7\. Some ORMs put their objects into an _identity map_ , so if you query for
the same object under multiple different scenarios (e.g different areas of
your UI), you always get the same underlying instance back. This means that
you don't have multiple copies of what is semantically the same object
floating around in different places of your app, and the object is always up
to date with the latest changes.

Note: my use case is typically client-side database backed software, so
features like (5), (6), and (7) save me from having to do a TON of work. If
you're doing more web oriented stuff, I can see how those particular features
may be less useful to you. Still, I think ORMs are a huge win overall.

Of course, none of this absolves you from needing to know what's going on at
the database level. You still have to know what SQL your ORM is generating in
order to make sure you're using it correctly. But I get really confused when
people badmouth ORMs and try to tell me that it's simpler to use raw SQL,
cause it never has been for me and my use cases _. And the funniest thing is,
if I stuck with raw SQL while attempting to solve all of the problems I listed
above, I probably would end up with a half-assed version of a full-fledged ORM
anyway.

(_The only use case I can think of where I'd prefer to use raw SQL over an ORM
is with report generation type activities: usually those types of queries
aren't very dynamic, they can often be too complex to be expressed via the
ORM's API, and once you have the data you're just dumping it to display
without worrying about interactivity anyway.)

~~~
einhverfr
So now suppose we refactor your database a bit, since all cakes are deserts
and we don't really need to store that the fruit cake is both a cake and a
dessert.

Suppose we make category have a self-join (add a parent_id field that
references category(id)).

Now we want to make sure that when we want to list all deserts, all
subcategories are listed too. Assume the possibility of arbitrary depth.

How easy is tht to do in your ORM?

~~~
stevenwei
Good question. I can't speak for all ORMs, but both SQLAlchemy and Core Data
provide direct support for self-referential relationships

SQLAlchemy lets you drop down to raw SQL as well if you need to, while still
taking care of mapping the result set to objects for you.

Core Data is less powerful (and technically not an ORM), but it's the defacto
standard on iOS so I'm pretty much stuck with it.

~~~
einhverfr
So if you have self-referential relationships of arbitrary depth then can it
generate clauses like WITH RECURSIVE or CONNECT BY? Or are you functionally
limited to one level of self-joins without dropping to SQL?

~~~
stevenwei
WITH RECURSIVE is supported in SQLAlchemy now, although the syntax is rather
verbose. Not sure about CONNECT BY, I think you may need to drop to SQL for
that.

~~~
einhverfr
that's pretty cool. (I would assume that since WITH RECURSIVE is the standard
and CONNECT BY is Oracle's invention that either it would be handled by db-
specific extensions or just be another cost of running Oracle)

------
latch
Surprised by the pro-ORM comments so far.

My hate for ORMs grew proportionally with my usage of NoSQL. It's such a
pleasant experience when your domain model fits the data model. Graphs are a
good example, but Redis is a better one. When your intention fits a Redis data
structure, it's just programming bliss - a few lines of explicit and simple
code.

The only way I can use a relational database now is with Sequel.

~~~
polyfractal
I'm confused by your statement. When your data fits the tool that was designed
to work with that data...everything is great? Isn't that the point of using
the right tool for the right job?

~~~
latch
What's confusing? For years...decades...developers have been taking a one-
size-fits-all approach to data storage - relational databases.

Now that we are starting to make use of more specialized tools, we're starting
to go back and question (or hate) our past approach.

------
mjburgess
That is really bad ideological myopia there. He refused to consider
alternatives (for which I would adovcate functional design or something like
sqlalchemy).

"Actually I think this is an inevitable consequence of using a relational
database - you either have to make your in-memory model more relational, or
you complicate your mapping code."

No its an inevitable consequence of trying to think with an object-oriented
type system to describe a set-based data collection! In other-words the
problem is attempting ORM: he's begging the question!

I would expect better from Fowler, but it does go to show the futher into an
ideology you get (heavily design-patterned OOP in fowler's case) the more you
see design and programming problems as problem with OOP rather than anything
more fundamental.

------
bitdiffusion
Adding to some of the anti-orm sentiment above: I always find that orm's are
trying to solve a problem that doesn't really exist. The proportion of time
(and therefore cost) spent on a project writing SQL is orders of magnitude
less than time spent on things like performance, scalability, user-experience,
testing etc. You might have something that only maps data from your results
into your language of choice but, again, if small, slightly repetitive tasks
like this are affecting your ability to deliver your software, you have bigger
problems.

Sure - database design is important - but that you have to do that regardless.

------
debacle
For the same reason PHP gets so much hate - both are tools with valid uses,
but because they make hard things easier for people, that means that people
who have no idea what they're doing can do very bad things.

Hating on a tool is ridiculous.

~~~
zzzeek
The core of fowler's article is, "what is the alternative to ORMs?" - and the
answer is there isn't much other than rolling your own.

This is not at all analogous to PHP where there is an abundance of
alternatives to it's mediocre design. PHP isn't hated because of its concept -
a "web-based scripting language". It's hated because it's done very poorly.
Just like a lot of bad ORMs Fowler refers to.

~~~
debacle
There is nothing wrong with PHP - it runs multi-billion dollar companies and
it is the most ubiquitous language on the web for a reason.

~~~
hzy
Because it has near-zero barrier to entry. That doesn't speak highly of PHP as
a language, just on how its deployment strategy is designed.

~~~
debacle
I don't think Facebook or Yahoo chose PHP because it was easy to write.

~~~
Cloven
you are incorrect.

~~~
debacle
Yahoo's rationale for choosing PHP:

"The short development cycles needed to stay ahead of the competition demand a
web-centric scripting language that is easy to maintain and update."

[http://public.yahoo.com/bfrance/radwin/talks/yahoo-
phpcon200...](http://public.yahoo.com/bfrance/radwin/talks/yahoo-
phpcon2002.htm)

They didn't choose Python, or Ruby, or Java, because none of those languages
are as expressive or shut-up-and-get-out-of-my-way as PHP. Just because a high
schooler can write PHP doesn't mean it's a toy not also meant for world class
engineers.

~~~
Cloven
PHP is more expressive than python, ruby, or java. I see.

------
dgreensp
_Not all problems are technically suitable for a NoSQL database._

You might as well say "I'm a big fan of JSON, but not all problems are
technically suitable for a NoXML data format." There's nothing magical about
SQL, as proven by the fact that so few apps actually use its abstractions! The
reasons to use SQL today are pragmatic; battle-tested engines, tools, etc.
Most developers will never write their own ORM or database, but the ones who
are up to it are starting to explore the design space quite profitably.

------
tomkin
Throughout history there has always been _some_ hate towards a platform that
inherits and simplifies another platform. In truth, it would be more efficient
to talk directly to the processor, but that takes a lot of time. And sure, ORM
isn't the fastest way to data, but the deficit is well made up in the end
result. You can't utilize ORM without an understanding of data. Suggesting
that developers are leaving semantics behind is absolutely not true in most
cases.

------
vegas
I think it's basically the O part. In theory, one might have a delightful
ontology of object oriented code, like Animal->Mammal->Cow , but in reality
you often have something much more like

CompanyNamePersistentObjectBaseClass->Entity->ExtendedPersistentEntity->CacheFactory->Entity(but
in a different
package)->NeedToAddAPropertyForJustThisReleaseIPromiseThisClassIsGoingAway->IdAddThisToTheBaseClassButIReallyNeedToRelease->Shape->Triangle->XYCoordinates

and then somebody wants to cache these things in hibernate.

~~~
sjwright
Your comment is screwing up the page layout. (Firefox, Win7)

------
anuraj
I for one never touch an ORM if possible. Coding for SQL in code is not that
difficult, but sure it leads to SQL lock in. But you don't expect to change
your underlying data model and scheme so often, and if you ever do, you most
probably need a complete redesign.

------
eternalban
He comes close but still misses the overall general issue and condition.

These are facts:

\- Relational and/or heap representations can be fairly universally
represented e.g. heap on FS aka VM

\- "impedance mismatch" problem is peripherally about latency and more
generally present regardless of heap/store (hey..) image representation medium

So in general:

Any in-memory store process that is distinct from an in-memory heap process
will still face the "impedance mistmatch" unless

a) a compiler bridges the semantic gap between the two representation domains

b) a mediator provides an interaction means between the two distinct processes

A very specific case of this general condition is:

[in-memory object language runtime] <\--driver--> [FS based relational DB]

Fundamentally the condition is that of 'networking' so the question
immediately presents itself as to "what is distinct from general networking
that is causing the tedium of mapping?".

Ad-hoc messages of potentially unseen semantics would certainly cause an
"impedance mistmatch" in a networking setup. Think of the nightmare of
infrastructure to do the boiler plate, etc. (Look familiar?)

So, queries are the raw nerve that is exposed by this type of coupling.

What is a query? What is (the meaningful consequences of) a query in context
of heap and/or store?

A heap typically maintains a 'collected assembly of facts' e.g. a set of
related objects (eternalban's comments trail collection object and various
other data) all nicely stitched up by a static compiler and/or dynamic
runtime. Same set of facts is distributed in a potentially unique manner
across an unknown set of store media. A query in store is 'work'. It is also
more precisely 'a functional computed in a specific time and space'. (CAP
anyone?)

So we know the answers to each and every one of these adjacency issues in the
simple form. The runtime memory object model process <-> RDBMS is a very
common system pattern that unfortunately exposes a whole truck load of these
difficult interfacing issues in one lump scary form. It is a lot of 'work'.

What is necessary to get around this is entirely against the prevailing trends
arguing for "simplicity" as this requires infrastructure level plumbing for
code at the same level of care and attention as that given to analogous
networks for IP.

tldr; ORMs (or any other magic layer) for independent actor system A-B
coupling works best when the interaction patterns are bounded and statically
known. Any increase in runtime uncertainties will necessitate working
partially outside of the framework hand-holding and may even be more tedious.
Choose your tools appropriately.

