

ORMs don’t kill databases. Developers do. - jagreehal
http://www.arrangeactassert.com/orms-dont-kill-databases-developers-do/

======
raganwald
Summary: _ORMs are a leaky abstraction_.

Thus, it is true that developers are responsible for knowing what is going on
under the hood and either contorting how they use the ORM to get what they
want or skipping it when necessary.

And it is also true that ORMs are to blame, they do not work as advertised,
and often impose extra conceptual overhead because you end up composing SQL in
your head and then trying to figure out how to get the ORM to generate the
SQL, instead of thinking in the ORM's abstraction.

~~~
skittles
That may be true, but that's what programming is for the most part. Pretend
like it will just work and then optimize later when it is proven to be too
slow through profiling. In this case, you need performance data from the
database. Use that data to improve anything that needs it.

~~~
rick_bc
Does it work 99% of the time though?

------
mdellavo
"a bad workman always blames his tools"

ORM's are just a tool, some better than others. Developers need to know SQL,
relational algebra and normalization. There is no excuse for not knowing
proper databasing, ignoring the SQL generated by your ORM and blindly
following along. A good ORM (like SQL Alchemy) will grow with you from the
simplest of cases to the complex.

Personally, I have never developed an application with an ORM and ignored the
SQL generated. Most of the time the SQL generated by the ORM is acceptable but
there are instances where the naive approach with an ORM generated piggish SQL
and needed to be reworked.

~~~
seliopou
I think your last point is right. It's not just about knowing your tools, it's
about knowing how your tools fit into the system that you've built.

------
escoz
You always hear this from somebody who is new to ORMs, spends 3 months working
on a project, and then is amazed because he doesn't like one thing or another
in the file.

It'll be funny when he discovers that with just a few other lines of code he
can not only fix that issue, but also completely change how data is stored in
the database, use 1st and 2nd level caching for the objects, and be able to
easily refactor all that stuff when the business logic changes.

By the way, I seriously doubt that query is the cause of performance problems
in his system.

------
generalk
Spot on. ORMs can generate some crazy-bad SQL if you're not careful with them.
A good ORM will generate great SQL in the common case and allow the developer
to execute custom SQL without fuss when necessary. If your ORM makes it a pain
to pump in raw SQL in place of generated SQL, find a new ORM.

~~~
escoz
"beautiful" sql is subjective. What does it mean? SQL that looks pretty to the
eye (with proper indentation)? Or is it SQL that can run on dozens of
different database servers with no modifications? Or is it SQL that can be
automatically generated so you don't have to spend any time thinking about it
but that runs 5% slower than something you spent 1 hour on?

I would probably go with the two last options.

------
clarkevans
The typical use of ORMs that kills the database is where one uses record-based
operations to traverse a set: resulting in one query per record.

The blogger's point is quite good: a developer probably doesn't know what SQL
is being generated, it may be opaque how small variants in query source can
result in radically different SQL based on the translation process. Hence, a
profiler should be used to find problematic queries. This is solid advice.

What's not mentioned though is that sometimes what appears to be very ugly SQL
actually out-performs pretty, parsimonious, hand-crafted SQL. This, assuming
the developer could actually craft correct SQL equivalent.

------
arohner
ORMs aren't the problem here, databases not designed for use by programmers is
the problem.

ORMs exist because straight SQL isn't a good interface for programmers, and
ORMs are bad/a leaky abstraction because the problem is really messy.

Instead, let's just build a relational database that 1) has an API that is
useful to programmers i.e. not SQL 2) supports explicitly specifying a plan,
not providing hints or clues, but using operations that look map, filter,
reduce, hash-join, etc.

~~~
russell
Wrong. Relational databases are one of the fundamental components of most
large scale applications and SQL is usually the interface. If you are going to
be a journeyman or master programmer in this area you must understand it. If
you dont, you are just an apprentice. Just like you must understand
networking, concurrency, algorithms, data structures, and hardware. If all you
understand is Java an its brand of OO then you are a trade school mechanic.

I know this is snarky as hell and I apologize for being offensive. But I am
not an MIT CS graduate. I am completely self taught. SQL has been a necessary
skill in every job I've had in the last 20 years. So when users were
complaining were complaining that a data download to csv files was limited to
200 records (due to a dozen joined tables, lazy loading, table locking, using
velocity to generate csv files, etc), I threw out Hibernate and velocity and
used SQL, JDBC, and direct HTTP output to generate 50 times the output in one
tenth the time.

~~~
escoz
Just because you have an example of a probably poorly designed, over-
engineered system that could be completely replaced with some hand written SQL
code, doesn't mean that is the best solution for everything. Come back when
you have an example of a system running on top of 500 tables, doing read-
writes in a big iron server, communicating with dozens of other servers, that
uses distributed caching to improve performance, and that you can explain how
you can replace an ORM with SQL.

~~~
russell
My point was not that ORMs are bad, but that SQL is required knowledge to
communicate with a databases, including getting around the problems an ORM
causes. It is hard for me to believe that someone who doesnt understand SQL
can be a competent ORM programmer.

The system I mentioned was well designed with hundreds of tables, multiple
replicated database servers, dozens of web servers, caching and all that. The
problem was a specific case where knowledge of SQL was able to produce a 50x
performance improvement.

I think SQL essential to learn because it bends your mind. You have to think
in terms of sets instead of sequential processes. Just like functional
programming and BNF are mind stretching.

~~~
escoz
cool, we agree on that.

------
emiller829
"Error establishing database connection"

ORMs don't kill databases. Hacker News does.

~~~
jagreehal
Sad but true! Bluehost should have sorted the issue...

------
ap22213
Using ORMs gives developers (and other stakeholders) a way to get through the
initial conceptual design and development faster. It helps getting to
'functional' much faster and it makes feedback loops very effective.

However, before pushing to production, you should always have some idea of
typical user scenarios. What actions are they taking? With how much data? How
often? In what sequence? Then, you load test based on that information. You
identify that 20% of the ORM isn't going to work, and you replace it with
stored procedures, or in-memory tables, or something.

For someone who grew up pre-ORM with on-staff dba-types who had to redesign
their schemas on every little feature change, and thus recreate entire data
layers, ORMS are FREAKING AWESOME.

But, you still need someone who knows what they're doing.

~~~
escoz
I agreed with you until the point that you said to replace things with stored
procedures and in-memory tables.

Having the two different contexts for logic, code and SPs makes things a
thousand times harder to debug and maintain, making the cost of anything
skyrocket.

ORMs are indeed awesome, but the solution for problems like this is not to
take ORM out of the way, is to work with it to make it work better.

~~~
ap22213
Why is that more difficult? It seems like it worked fine for two teams I
worked with. What are other common or better ways to get around this?

~~~
escoz
the big problem is that it makes refactoring and changing logic much harder,
as you now two different places to maintain business logic.

In my experience, most problems that DBAs would like to use SPs for can be
easily solved by refactoring/merging SQL calls and proper caching of objects
during a request or session, two things that an ORM really help you with.

------
kevingadd
"Error establishing a database connection"

Heh.

Anyway, cached text from Google:

\----

Following a recent post I wrote Phillip Haydon made an excellent argument
about not using LINQ in ORMs like nHibernate and Entity Framework without
consideration to what they are doing.

Something I definitely agree with.

    
    
        Just as guns don’t kill people, ORMs don’t kill databases. Developers do.
    

I should have picked up on this because it’s happened to me on a project I was
working on. Death by ORM During development everything was performing well. So
well that even the most sceptical developers decided to use ORMs instead of
stored procedures.

The application was responsive and users were happy.

However in production there has no happy ending. The application went from
running like sh!t off a proverbial shovel to just sh!t.

It was a nightmare and the finger pointing began.

The problem was we assumed (always a bad thing) the queries generated by the
ORM would be as good as if we had written them ourselves.

So when we looked at what was being generated it came as a bit of a shock to
see what was going on.

But while it was easy to use the ORM as a scapegoat the reality was developers
were at fault.

Here’s a simple example of how even the smallest of changes can make a
difference in the SQL ORMs generate.

In the code below all we want nHibernate to do is select the of the fruit that
is NOT also a color.

    
    
        [Test]
        public void Tale_Of_Two_Queries()
        {
            var fruitDatabase = new FruitDatabase();
            using (ISession session = fruitDatabase.Session)
            {
                using (ITransaction txn = session.BeginTransaction())
                {
    
                    var apple = new Fruit() { Name = "Apple", IsAlsoAColor = false};
                    var orange = new Fruit() { Name = "Orange", IsAlsoAColor = true };
    
                    session.Save(apple);
                    session.Save(orange);
                    txn.Commit();
                }
    
                var firstQuery = session.Query<Fruit>()
                .Where(f => !f.IsAlsoAColor)
                .Select(f => f.Name)
                .ToList();        
    
                var secondQuery = session.Query<Fruit>()
                        .Where(f => f.IsAlsoAColor == false)
                        .Select(f => f.Name)
                        .ToList();
            }
        }
    

For the query that uses !f.IsAlsoAColor the SQL query looks like this

    
    
        select fruit0_.Name as col_0_0_
        from   Fruits fruit0_
        where  not (fruit0_.IsAlsoAColor = 1)
    

For the query that uses f.IsAlsoAColor == false the sql generated looks like
this

    
    
        select fruit0_.Name as col_0_0_
        from   Fruits fruit0_
        where  case
                 when fruit0_.IsAlsoAColor = 1 then 1
                 else 0
               end = case
                       when 0 /* @p0 */ = 1 then 1
                       else 0
                     end
    

While this is a trivial example it highlights the fact you don’t know what the
generated SQL will be, let alone the execution plan.

And if your query returns the results you want are you really going to spend
time finding out what’s going on ‘under the hood’? How comes nobody talks
about this?

I can’t ever recall a presenter warning about this during any ORM talk I’ve
been to. The focus always on what it can do and how it compares to other ORMs
rather than best practices. It’s not just a .Net problem either

After talking to Ruby on Rails developers this is a problem for them too,
because Active Record doesn’t work 100% of the time. When it goes wrong they
have to implement workarounds to execute queries without using active record.
So how can I make sure it doesn’t happen to me?

The first thing is to know what issues to look out for. The nHibernate
profiler alerts page is an excellent resource for this.

With this information you can use profiling tools to see what queries your ORM
is generating and how your application is performing.

My last tip is to always use the best data access method for what you’re
doing. It’s easy to become complacent or aim to everything using an ORM, but
if it makes more sense to use a stored procedure then that’s what you should
do.

It would be good to hear what your experiences are with issues like this and
to share any tips/tools you have used to avoid ORM performance issues.

In the meantime I’ll leave you with this excellent DBA vs. Developer (Star
Wars Style) video

~~~
clarkevans
What seems to be suspicious SQL (the CASE/WHEN/ELSE) is probably not the
problem. This is classic boilerplate SQL "cast" code to make a boolean value
for databases that don't support booleans. Distracting, perhaps, but not a
smoking gun. Likely the problem with something like this is that there's a
full-table scan on the Fruits table, and that the function doing the scanning
is called frequently without caching.

~~~
dmethvin
> What seems to be suspicious SQL (the CASE/WHEN/ELSE) is probably not the
> problem.

It certainly _could_ be the problem. Like you say, a CASE becomes a computed
field and requires a table scan. A simple check for IsAlsoAFruit can be done
with an index scan, assuming it has that index. This can be a huge difference
in data scanning and memory load. Of course, there are also plenty of times
when SQL will spurn a boolean index because it's not selective enough.

------
geekam
"Just as guns don’t kill people, ORMs don’t kill databases. Developers do."

Guns do not kill people but, they sure do facilitate killing, don't they?

------
adelevie
If you're building a Rails app, just profile your queries (perhaps with New
Relic) before pushing to production.

~~~
seliopou
I don't think that NewRelic would help in this situation, as they only display
SQL queries and the corresponding execution plan for slow queries in slow
transactions. While this is certainly helpful, the blog post is pointing out
the need for developers to know what their ORM is doing from the start, not
just when a performance problem begins to manifest.

Put another way, if you have good visibility into your system early on, and
understand what's going on under the hood, you can avoid performance problems
in the future.

~~~
escoz
I agree with what you said about developers understanding what does on under
the curtains, but regarding the NewRelic statement, I couldn't disagree more
with this.

Spending time early on in the project on things that don't matter and that
don't really impact the system, like the example given in the post, is exactly
the kind of thing that ends up creating projects that are expensive to
maintain and just bad.

Focusing on solving large performance bottlenecks, that can easily improve
things 1000 times more than fixing that simple example is a much better use of
your time, and is what newRelic shows you.

~~~
adelevie
This. At almost every level of web development, we're working with code that
generates some other kind of code. Ruby -> C, Erb/Haml/whatever template =>
HTML, maybe Coffeescript -> JavaScript, and finally ActiveRecord/any ORM ->
SQL strings.

I write Ruby all the time, I don't know a lick of C. I've got a solid
understanding of SQL and relational data modeling, but why should I check the
generated SQL of _every_ single query?

Furthermore, if you don't use an ORM, your model's methods will contain a lot
of similar functionality. Hmm, maybe I should write a module or base class so
I don't repeat myself. Eventually you will have written an ORM.

------
TobiHeidi
ORMs dont kill databases. SQL does !

