Hacker News new | past | comments | ask | show | jobs | submit login
What ORMs have taught me: just learn SQL (wozniak.ca)
372 points by numo16 on Aug 4, 2014 | hide | past | web | favorite | 234 comments

Here's a thought experiment. Lets say we lived in a world without SQL and the default way to talk to DB's was through an ORM....

And then someone came and said: "I created this concise and super flexible language for querying data." Would people want it? I think they would, and we'd see tons of articles about vast forests of objects being replaced by small snippets of SQL.

"Object-oriented programming is great. It's what's enabled developers to create the vast world of amazing applications available today. But there's an impedance mismatch between OOP and relational databases. That's why today we're reinventing database access. Say hello to SQL."

If it was released today, they would skip the awkward do-i-spell-it-out-or-do-i-add-vowels part in favor of the catchy-but-meaningless-project-name and just call it Sequel. Then they could be fresh and say it's the "sequel" to ORM.

(Though you might go for Seequill or something so people could google it.)

From Wikipedia: "...SEQUEL (Structured English Query Language), was designed to manipulate and retrieve data stored in IBM's original quasi-relational database...The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-based Hawker Siddeley aircraft company."

Huh. Interesting. That'll teach me to make jokes without consulting the history books.

When I look at SQL through the lenses of hindsight I see a language that's not amenable to IDEs (it's harder to autocomplete columns if you must write those before the table name, as an example), and has questionable and verbose syntax.

While straight relational algebra is actually quite readable, despite all the efforts of most the anti-ORM crowd, at the end of the day the business logic that works on business objects is much more important than the storage backend, and thinking in terms of objects seems to be the preferred alternative when reasoning in that context.

I've been doing backend logic for quite a while now, and I have to say that the special features RDBMs offer are great for reporting, but reporting is a very, very small part of what most code that interacts with the business domain does. And while that may be because SQL is great at aggregating data, the fact that it can't easily be plugged with the rest of my business logic is a huge impediment.

This wouldn't be a problem if my entire business logic were to reside in the database, but real world applications interact with external APIs, regular files, and a whole other bunch of stuff. So the fact that ORMs speak the language of my business logic is a far greater advantage.

I also find the argument of inefficiency to be a strawman. Well-written ORMs are quite explicit (and lazy) about what they're doing, and standard best practices would dictate that you should be properly describing the scopes and fields you're fetching when your rows become wide enough. But SQL demands that too; you can trivially fetch * from a table. And SQL's limitations mean that you don't have access to all the sweet abbreviations ORMs provide such as scopes, custom query managers, aliases, built-in result caches, and being able to avoid the worst cases of vendor-specific SQL.

I started using ORMs because I was tired of writing the same SELECT statement with 10 slight variations, stored in a source file for a different language, having to deal with row casting, and being unable to plug in simple code to fetch related entities. SQL thus far has not advanced one bit in this area, and until someone comes up with a way to modularize the language so it can provide those features, purists will still complain while most of us keep using ORMs to avoid verbosity, bugs, and compatiblity problems.

I question the claim that reporting is a small part of what interactions with the business domain is about. For many shops, I'd wager the opposite claim is true, honestly.

Of course, there is a very thin line between "reporting" and "transforming to better deal with explorational algorithm." That is to say, my assertion is that it is when folks try and come up with "one true model" of their data that thing start to suck. Whether in ORM land or straight SQL land. Or just flat out "in memory" land, honestly.

Sorry Toad or Work manager and the MYSQL tools are perfectly acceptable IDE's for SQL development.

"Perfectly acceptable" doesn't reach the standard of type inference and feedback that you can get with modern IDEs for their supported languages. Visual Studio gives far more feedback for LINQ than SQL, and it's damn more useful for debugging queries.

I suggest you need to learn to code SQL and the procedural extensions for the dialect in use then - and OO is not the be all and end all of development.

I have over a decade of writing SQL, thank you very much. I still stand by my assertion and I there's no way I'll be writing procedural SQL for the use cases I deal with on a daily basis.

There is another theory which states that this has already happened. See: ISAM databases as found in COBOL programs.

The problem was SQL got standardized, and became static. There is so much room for innovation in the database language space, but that would mean breaking compatibility with the SQL standards...

> The problem was SQL got standardized, and became static.

SQL hasn't been static since it was first standardized, much like C++ hasn't. However, there's may be a good case that many of the central, underlying design decisions that are by now difficult to extract without tossing the whole thing overboard could have been done better with the benefit of several decades more experience in PL design and understanding of how databases are used.

> And then someone came and said: "I created this concise and super flexible language for querying data." Would people want it?

Sure, but if we're going to imagine that, hopefully it looks a lot more like a D [1] than like SQL.

[1] As described in The Third Manifesto, http://www.dcs.warwick.ac.uk/~hugh/TTM/TTM-2013-02-07.pdf

I used to write raw SQL for many years, then, around 2005 switched over to ORMs in order to be able to target different databases, have a nice model, etc. Lets be honest here, the ease of justing doing:

    p.username = "Carl"
    p.age = 33
instead of "update users set username=:username, age=:age where id=:id" has a ton of advantages. For one, some sort of syntax or type checker is actually trying to understand your queries and makes it easy to find typos before the database laments in the middle of a huge transaction. Strongly typed languages are even cooler here (most notably Slick for Scala, which has a fully type-checked DSL for database querying which makes it really difficult to create typos) [1].

However, the downside of this is that it only works as long as whatever you need from your database is the lowest common denominator of database features. I'm currently working on a Clojure + Postgres project and I'm using all the bells & whistles that Postgres has to offer: HStore types, Json Types, subqueries in subqueries, Upserts, functions, etc. I have a rather complex database that I'm running analytical queries against, and using an ORM for this would simply not work. I'd spend half my time trying to figure out how to implement a certain Postgres feature in -insert-or-name-here. I actually started with Korma [2], a simple Clojure ORM, and gave up because it was too tedious trying to figure out how to get it to correctly run my 4-function column result query.

I've since switched to YeSQL [3] which follows a very interesting idea: You write queries in .sql files (full editor support, jay) and tag every query with a name. YeSQL then reads these .sql files during compile time and dynamically generates clojure functions with the correct amount of parameters based on these queries. It is kinda the best of both worlds. Example:


    -- name: get-users
    -- a function to retrieve the user by name
    select * from users where username=:name

    (defqueries "some/where/queries.sql")

    ;; now I can do:
    (get-users "carl")
I think it really depends on the use case. If the database model is simple and there're no crazy database technologies in use, I'd rather go and use an ORM again I guess.

[1] http://slick.typesafe.com/ [2] http://sqlkorma.com/ [3] https://github.com/krisajenkins/yesql

YeSQL seems a little bit like it's reinventing the Microsoft data access ecosystem of 10-15 years ago - stored procedures behind a code-generated API. Retro is cool.

The Clojure SQL ecosystem is weird. clojureql seemed wonderful for a while, but has been left to rot and the fact that nobody's really picked it up implies people have just moved on with their lives. Korma and YeSQL seem to handle most of the Rails-like use cases, and I guess everyone else has moved to more esoteric datastores.

Except stored procedures have to be stored in the database AND usually end up in source control too. At least with this library there is one source for your queries. Also, they are compiled into your language as first class functions, which I like too... kind of like how JSON is a first class citizen in JavaScript.

I really like the approach.

I think alot of people (including myself and my team) just tend to get on with their lives and use clojure/java.jdbc. Sure it's not that sexy, but it gets the job done.

And for the 80-90% queries you can create a small function or namespace with a nice API and converts to the types you like.

However, the downside of this is that it only works as long as whatever you need from your database is the lowest common denominator of database features.

This can be an overriding practical objection to every ORM I've encountered so far.

To give an example I've run into several times, Postgres offers several levels of transaction isolation. The more isolated levels offer stronger guarantees, but you also need to be able to recover and retry if a transactions fails a serialization condition the first time. These levels determine how interactions work with related tools like explicit locking, SELECT FOR UPDATE, and so on.

In least common denominator ORM world, you're lucky if you get any serious control over this kind of thing at all. If you actually have a use case that requires precision here -- and sometimes you do even in surprisingly simple use cases, such as needing to allocate new IDs in an increasing, guaranteed contiguous sequence in Postgres -- then this stuff matters.

Writing longhand SQL queries is a pain for several obvious reasons, but it's still better than using an ORM and finding it doesn't support a feature you need or, worse, it does things implicitly and sometimes gets them wrong.

I'm loving all the momentum towards writing templated-sql, in fact, I wrote a library for this myself[1].

By leveraging jinja2/django-style template inheritance, you can even bring some advantages of ORMs (composition, reuse, and extending) into the raw-sql world.

The OP also intimated that he's taking a templated approach:

"“In these cases, I've elected to write queries using a templating system and describe the tables using the ORM. I get the convenience of an application level description of the table with direct use of SQL. It's a lot less trouble than anything else I've used so far.”

[1] https://github.com/civitaslearning/swigql

If you're at all interested in opening a kick starter for such a templating library for Django, I'd back it. I have attribute creep all the time and actually generally prefer raw SQL with the exception of its verbosity. The problem is, migrations are awful and SQL injection mistakes easy to come by. Would be great to have the best of both worlds in a SQL templating engine + sort-of ORM wrapper that auto-generates via SQL inspection

Without the ORM, though, what would be the point of using Django? To me it seems like this would be a better fit for a more minimalist platform like Flask.

not sure I follow? Most sql templates I have seen behave like Django's .raw() function (which inspects the resultset and auto binds to the object)

I'm slightly envious that you are working on a Clojure project with PostegrSQL, especially involving all the bells and whistles. Got any PostGIS or otherwise geospatial data, on top of all that? :)

Well, it is my own project, so I got to choose the technologies :) It is a lot of fun working on it. The data does have location information, but I'm not sure if I'll use PostGIS for it, as it would be a bit of taking a sledgehammer to crack a nut. The location information is rather sparse.

Most people who think they need postgis just need earthdistance. Try it.

Thanks for sharing your experience. I've been meaning to try Slick, and YeSQL sounds like a nice way to reduce some boilerplate with no real downside. I go back and forth about how I feel about ORMs. I think everyone can agree you'll need to learn SQL for any non-trivial project, even if you end up using some abstraction on top of it.

On a tangent: you mentioned Upserts in Postgres features. I thought Postgres didn't have any kind of Upsert. Was it added recently or something?

Postgres indeed doesn't have Upsert yet, so I'm going the default way of locking the table, and implementing it via a slightly more complex query. I was just too lazy to explain that in my earlier comment. The problem is the same: The syntax below can't really be represented well in a ORM.

    WITH upsert AS (UPDATE search_tracking SET count=count+1 WHERE keyword = 'John Doe' RETURNING *) INSERT INTO search_tracking (keyword, count) SELECT 'John Doe', 1 WHERE NOT EXISTS(SELECT * FROM upsert);

You can use writable CTEs as an upsert in postgres. http://dba.stackexchange.com/questions/13468/most-idiomatic-...

But this aint that hard, i suppose it could also be done in Postgres (query using MS SQL Server)

Table1 SET (...) WHERE Column1='SomeValue'


    INSERT INTO Table1 VALUES (...)

not atomic

Thus transactions...

Postgres still doesn't. I usually write a rule to do so. For instance, here's my "ON DUPLICATE KEY UPDATE eid=eid":

  CREATE RULE location_updates_on_duplicate_ignore AS
    ON INSERT TO location_updates
    WHERE (EXISTS (SELECT 1 FROM location_updates WHERE (eid = NEW.eid))) 
But you can also do much more advanced merge logic by replacing DO INSTEAD NOTHING with DO INSTEAD <statement>. Mine just needed to ignore already-submitted batches.

Side Note: Slick has basic parameterized queries/statements/DDL.


  val cachedPurchasesQuery = Q[String, (Int, AccountId, Timestamp)] + """
                                                                          |SELECT pi.purchasable_item_id, u.account_id, pi.created_at
                                                                          |FROM purchased_items AS pi
                                                                          |INNER JOIN users u ON pi.user_id = u.id
                                                                          |WHERE purchasable_item_type = ? AND u.account_id IS NOT NULL
Then you call that like:

  val magazinePurchases = cachedPurchasesQuery("Magazine").list
Or `foreach`, `firstOption`, etc. Most of the normal collection-y stuff.

I prefer this over the Table mapping DSL and for-comprehension stuff personally. But I've only used it in production on smaller projects that are limited to under a dozen queries/statements or so.

I really like scalikejdbc [1] over slick especially using the parameterized queries. They handle it more elegantly than slick which even for Scala has a pretty meteoric learning curve. I also really like how I don't need specific dialect to get it working on newer restricted sql db's/olap systems like phoenix [2] or presto [3].

[1] http://scalikejdbc.org/ [2] http://phoenix.apache.org/ [3] http://prestodb.io/

Writing sql by hand doesn't have to mean you abandon things like autocomplete and automatic highlighting of typo's. SQL can be inspected by a proper ide just like any other language.

To be fair, SQL has a syntax that is hard to provide (for example) autocompletion for, as the table comes after the fields, and the field names can be ambiguous.

Postgresql command line (psql) still manage to do an excellent job at it (but I concede it's pretty hard to replicate)

IntelliJ does SQL completion quite nicely based on a connection it can make with your DB: http://www.jetbrains.com/idea/features/database_tools.html

It seems this would be bypassed by letting the IDE hit the DB to fetch table information.

>Strongly typed lanaguages are even cooler here ...

No expereince with Slick in particular; but I've been using jOOq[1] which I believe is similar.

To be honest I'm not entirely sold that these DSLs are what I'd consider "strongly typed." I can get jOOq to pretty easily yield queries that won't work if I switch out database dialects. (Ignoring, for a moment, that jOOq will let you embed SQL fragments as strings.)

As an example: jOOq will happily let you write an update query targeting a table bound to an alias. This type checks just fine. In fact this query will even work in Postgres; but it yields a syntax error from the database if you target SQL Server 2008 instead.

Then there's the issue of the dialects letting you use features the RDBMS doesn't support. For example you can build a merge statement regardless of which dialect is selected. If you were to select the Postgres dialect you'll get an "unsupported exception" at runtime.

If a method is RDBMS specific (in practice) then why is it part of the generic API?


My biggest beef with all these SQL abstraction layers is that many of them claim to be "write once, run anywhere." In practice I've just never seen that to be true.

I wish they would incorporate some sort of "capabilities" system that could run at compile time. Just imagine: when you swap in the Postgres driver your IDE throws a little red squiggle under the `.merge()` call, "method not found."

[1]: http://www.jooq.org/

> To be honest I'm not entirely sold that these DSLs are what I'd consider "strongly typed."

Absolutely! They're "quite" typesafe, much more than string-based SQL. Much less than actual compiled stored procedures.

> As an example: jOOq will happily let you write an update query targeting a table bound to an alias.

Yes, that currently cannot be detected.

> Then there's the issue of the dialects letting you use features the RDBMS doesn't support

That will be addressed in the near future when we implement an API "preprocessor" that will effectively remove all parts of the API that are not supported by your given dialect. We'll also distinguish between native support and emulated support, if this strictness matters in your application.

The relevant issue is here: https://github.com/jOOQ/jOOQ/issues/720

> In practice I've just never seen that to be true.

True, but if you need "write once, run anywhere", you're probably much better off with a headstart than if you started from scratch.

> To be honest I'm not entirely sold that these DSLs are what I'd consider "strongly typed." I can get jOOq to pretty easily yield queries that won't work if I switch out database dialects.

I can get Hibernate to pretty easily yield queries that won't work if I use a mildly unusual combination of JPA features. weeps

> I can get jOOq to pretty easily yield queries that won't work if I switch out database dialects.

In fact, you can do this with SQL. The following is perfectly valid in MySQL but not in any of the other dialects:

    SELECT a, b, COUNT(*)
    FROM t
    GROUP BY a
SQL is so heavy with implicit semantics that cannot be expressed in syntax alone...

Let's just be honest, ORMs add a lot more weight than just "p.username = 'Carl'". You do not always have a type checker, besides, there are unit tests. It is not just masochistic to use pure SQL.

web2py gets a lot of hate from some parts of the python community (with Ronacher & Moss-Kaplan being the most prominent examples), but its DAL (Database Abstraction Layer) feels like the right way to approach this: It's not an ORM, but it gives you a lot of what ORM gives you; It's not directly SQL, but the mapping between DAL code and underlying SQL is almost trivial (though it might depend on the specific DB). It just works well, and can be used independently of web2py.

It will even do database migrations for you if the schema evolution is reasonably simple.

I actually like this a lot... This would replace my whole Service, Data and Repository layer in Asp.Net MVC (DDD pattern).

I suppose i could execute this with EF ( http://goo.gl/yrpver Stored Procedure mapping) and this way, i can still map my table in code (using Code-First)

Is there something like YeSQL for Go?

Not that I've seen, but I've love to see it. You could probably build it with something like text/template and write:

    SELECT id, title, expiry_date FROM posts WHERE id = {{ id }} AND expiry_date > {{ date }} ORDER BY expiry_date
This would give you some flexibility--the template tags would be replaced by ? or $1 or :name depending on the database driver set on parse--and leans on the stdlib. This would tie in nicely with https://github.com/jmoiron/sqlx and its struct tag/marshaling behaviour.

Access could be handled by a map[string]string, with the key as the filename and the value as the SQL (as a first thought).

Forgive my ignorance as I'm not very familiar with golang, but what you're describing above looks (to me) like a parameterized query. Nothing real fancy about that. I'd assume that Go has support for parameterized queries already.

  getUser, _ := db.Prepare("select * from users where username=$1 limit 1")
  row := getUser.QueryRow("john")

If you have been writing raw sql for years I hope you did not use use select star

I've caught a lot of flak for saying this, but I'm convinced that all ORMs are ultimately tech debt. Sure, they get you up and running quickly, but once you're there, you'll invariably find yourself wanting to do things that require you to work against and around your ORM to accomplish. By pretty much any definition I've ever encountered, that's "tech debt"

For the last three years or so, I've been telling anyone who asked that ORM is an antipattern, to be avoided at all costs. I've settled into wrapping all queries in classes, with any parameters exposed as public properties. The SQL is written inside the class, essentially in a template. When necessary, the generated query can change based on the values assigned to the properties. All the mechanics of how the query is executed and passed to the database get rolled up in the base class, with a public method that returns the dataset returned by execution. Then I just run the dataset through transformation functions to get the data structures I actually need - combining multiple result sets if necessary to build really complicated objects when I have to.

I've been building this approach for a couple years now, and I haven't regretted it for a minute. I don't have to fit my databases to the vagaries of the ORM layer, and I have full control over my queries. I NEVER want to go back to ORM :)

> I NEVER want to go back to ORM

Except that you just invented your own ORM.

Think about it: you are encapsulating SQL data into classes, in other words, mapping relational data to objects.

That's an ORM.

You've just broadened the definition of ORM so that any SQL abstraction layer in an OO language becomes an ORM.

That's a rather nonstandard usage.

ORM is "Object Relational Mapper".

If you are taking data out of a relational database and mapping it into objects, you are implementing an ORM.

Seems like I'm sticking to the exact definition of an ORM, aren't I?

> If you are taking data out of a relational database and mapping it into objects, you are implementing an ORM.

No, ORM is a particular approach to doing that; the query abstraction approach described upthread is closer to the DAO pattern, to which ORM is an alternative. People were using RDBMSs to provide a persistence layer for OO programs before ORM was a thing, but as you have broadened the term any use of an RDBMS to store/retrieve data used in an OO program would be "ORM".

Now you're the one using a nonstandard definition of ORM. Here is how Wikipedia defines it:

    Object-relational mapping (ORM, O/RM, and O/R mapping)
    in computer science is a programming technique for
    converting data between incompatible type systems
    in object-oriented programming languages.

"is a technique for" does not mean "includes every technique for".

Spearfishing is a technique for catching fish, but not every technique for catching fish is spearfishing.

I believe the difference here is that one is an ORM framework, while the other is just objects bound to a database.

ORM frameworks usually allow you to drop down to SQL. But then you're stuck in that terrible world of depending on poorly-documented and soon-to-be-deprecated (or already deprecated) internals. Which are pretty much guaranteed to not fit what you really need anyway. Then you have to bridge your hack objects with the "proper" ORM objects. Your crufty hack objects will probably never be seen as first-class citizens in ORM land, forever banished to edge case hell.

I think it's pretty clear that (s)he's talking about ActiveRecord.

I've used this pattern and I like it a lot (for the reasons you say) but I find I still spent way too much time on really mundane stuff... particularly table<-->object mapping if the database table has a lot of columns.

My current approach is:

1. I basically judge ORMs on how easily they allow me to use custom SQL. (ActiveRecord makes this pretty tolerable with find_by_sql) 2. Let the ORM handle as much CRUD and table<-->object mapping as possible 3. If I have complex SQL, I try to wrap it in an appropriate database object (view, sproc, function).

(caveat: I'm a developer but I haven't used ORMs very much.)

Don't most ORMs let you write raw SQL when you really want to? In that case, you could use the ORM for simple things, but revert to raw SQL when you need more power. Or is that not the case?

Yes, but the ORM often influences the schema design. That can be very painful down the road when you realize your tables are actually tables, rather than instances of objects, which would be what your ORM led you to believe.

I think the problem is not that an ORM often influences schema design, it's that Relational Databases/SQL often influence application design.

People complain that an ORM isn't using a relational database effectively. The greatest contribution of the rise of ORMs is that relational databases are hard to use properly.

Bring on the ACID compliant document databases.

No you have it backwards. RDBMS are as they are because maths (relational algebra and calculus). There is deep theory behind doing things this way. You can put data in without needing to know how it will be accessed and used (and vice versa). NoSQL just doesn't have this rigor. You have to tightly couple what creates the data with what consumes it. THAT is just begging for trouble down the line.

That is absurd. Saying that using a certain data access api tightly couples you to it is as crazy as saying using variables tightly couple you to ram (and using ram is obviously bad!).

I honestly don't think you understand what tight coupling means.

You have data. To access the data you use an api. SQL is forcing you to use a generalised API, which is very old, hard to use and more importantly; hard to test.

If you actually want rigor in your SQL API, you use stored procedures. So now you're maintaining two languages (SQL and stored procedures) in addition to your application language.

For me, for most things, I just write a webservice which talks to whatever database I want. That's the API I expose. Anything can consume the API as long as it follows my RESTful spec.

With this architecture, I; 1) Don't have to struggle with SQL, making development faster. 2) Don't need a DBA making development cheaper. 3) Get to write in one language making testing a lot easier, which in turn makes quality higher. 4) can scale easier, picking whatever data storage characteristics are important to me.

Lastly, saying that NoSQL just doesn't have this rigor really makes me wonder what you think of google's bigtable or amazon's simpledb?

That's a good point, I think I agree.

However, my gut tells me that the instance/object nomenclature that ORMs require does a lot more harm than good in the long run.

I think more specifically the rows in your tables is data. And data tends to long outlive even through many application redesigns.

Often that is not the case as ORMs keep a lot of internal state, so if you go bypassing them you no longer get any benefit from the ORM at all.

Either that or you can write in a SQL-like language (e.g. JPQL for JPA).

Depends, some libraries like ActiveRecord for Ruby makes it hard to drop down to SQL (you lose data type conversion, etc).

Also see what the others said about the ORM influencing your database schema.

That's been my experience. From what I can tell the only time ORM tools actually make sense is when you have a lot of tables (or fewer tables with a lot of columns) and your interactions are very simple - i.e. you're mostly dealing with one table at a time.

But this is pretty much an edge case. For simple applications I can write all the SQL in less time than it takes to configure Hibernate, and applications that require me to join seven tables in every query, use analytic functions, or share schemas with other applications, ORM tools don't handle the complexity very well.

Interestingly enough, no one ever listened to Gavin King (creator of Hibernate), when he said that you shouldn't use an ORM for everything.

It is relatively easy to draw a clear line between using:

  * ORMs for domain model persistence (complex, stateful CRUD)
  * SQL for relational model interaction (complex, stateless querying)
Bottom line:

  * Don't use ORMs for querying
  * Don't use SQL for complex CRUD

Thank you for sharing this. Do you have a source for this quote? It completely reflects my experience over the years of what makes sense given the relative merits and pitfalls of each method.

For long time, I have no idea that SQL & OO are not friends. I work in a language where such problems don't exist. And it have nice ways to move data between tables and data structures and objects (For example: SELECT..INTO Array NAME).

Is FoxPro. Even the stored procedures were foxpro, all along the stack, from UI to inner DB actions. Is a shame that this kind of programming is "lost" today.

This is kinda like work with the postgresql main language everywhere, but the language is not half-powerfull. You don't need a "database" language and a "application" language. In fact, the whole experience was very good, and even let some people to do his own apps (kinda like Acces) because the approach was not intimidating, like VB, C or Python is (ie: Torwards do database apps).

So in short: Back them we don't use a ORM, we don't need it, we still have nice syntax and helpers. I still try to backport some of this to my own code, but the SQL is SOOOO restrictive that the effort stop fast.

Dear Lord in heaven, no.

Yes, the data language and the application language were all the same language, which led to some awful spaghetti code where data access was strewn all over the place.

Then you also get to deal with scaling issues because your db and application logic all run on the same box.

But hey, you can still download and run Visual FoxPro from MSDN if want,

Source: My team is in the process of slowly strangling a FoxPro application whose only system spec is "what it does".

That is the description of almost all apps out there. Only with the popularization of MVC is started to be cool to decouple things. I see the same behavior in several codebases and languages (I do mainly business apps and code cleanup).

But then, in the case of VFP, you still have a single kind of mess to clean, and a language/API small enough to fit in the head. Now, we are talking for several kind of languages, APIs and paradigms intermixed all around a project (like for example, mix a ORM, Sql, OO, not only python+js+html+css)

You know, I remember visual foxpro coming with microsoft visual studio 6.0, and not really understanding much about it. Care to expand upon what made it awesome?

I found some stuff here but it doesn't really explain it well: http://www.foxprohistory.org/articles_4.htm

It shine for business apps (aka: the most common of it).

Is more "batteries included" than python. You can do a full app with just Fox, because it include reporting, database, database designer, form builder, menu builder, OO, procedural, SQL, a really good grid. Seriously, I have only know of 2 great grid controls: The VFP one and the DevExpress for Delphi one.

Have a truly good grid was something damm useful. Manipulate data was easy, you could make a app only around the grid!

But all of this included.

And this could be made to work for the end user. So, your reports could be designed by a end-user. Or build forms. Or menus. Similar to lisp where everything are the "()" in fox everything was a table. A form? a table. A report? a table. A class? a table (if designed visually. You can make everything in text like a regular program too).

You can make dynamic the whole app if the developer learn to use the language fully.

Was also relatively easy to grasp. A power user could work on fox with a dozen of commands (and remember: The grid, like a excel spreadsheet, was enough for a lot of task), but in contrast with acces, you can "upgrade" the mess to be more professional.

Is simple for a programmer, too. My first programming class was with foxpro, and at the end of it we have made a FULL app with menus, forms, tables, reports and mini programs.

The closet thing today? Perhaps the django admin, but it is far less powerful and constrained.

The weak of fox was that MS prefer to push Acces (and both fight for the same kind of user), it neglect the DB engine pushing for sql server and, well, neglect the language.

ah yes. foxpro! this is the first web language i ever learned. it was awesome because the db and application language live in the same space. You could run SQL directly inside the language...

When i learned the "best practices" of the time (1996 ish)... i thought it was extremely inefficient to have to "formulate a sql statement", then call it then retreive the result...

Most of the time I see arguments like this the author probably has an inconsistent and de-normalized schema. With the caveat that all growing systems gather warts, and yes, it's nigh upon impossible to keep your data de-normalized at scale with sufficient complexity, an ORM coupled with a decently designed schema is an unbeatable combination. Not only do you get fast development, code re-use, and easy benchmarking but you also get all the knowledge and expertise that's been baked into the ORM. Most ORM's now come with miles of security features built in, and are able to infer and optimize on complex queries and joins better than your average non-DBA developer (assuming your data size isn't in the millions of rows, per table, with < 3 tables to join... aka 90% of all queries).

Of course general solutions sometimes can't match up with tightly coupled, highly optimized, extremely specific hand crafted SQL, and duh, making things easier can encourage bad practices, but that's just parroting tautologies and ignoring all of the benefits you get with an ORM. In fact, good ORM's will even provide tools to allow you to construct custom queries in the same scope as an ORM query, allowing you to have your cake and eat it too. Why would you throw out the baby with the bathwater when you can just rewrite your most egregious 10% of queries in SQL, while allowing the rest of your app to merrily chirp on? Hopefully one day someone super smart can create a brilliant ORM that incorporates machine learning, whizbang functional tools, and insert trendy something or other here and we can all relegate SQL to the status of "DB assembly" where it belongs. I, for one, am embracing an era where devs no longer have to shlep around arcane DBA wisdom tidbits in comments above grotesque SQL queries. Why would you ever reject high level abstraction over tedious minutiae?!

P.S. This is all ignoring the ease of which you can switch database backends/engines with ORM's, which could be a 500 word comment all on it's own

> A recent count of one such table in my work resulted in over 600 attributes and 14 joins to access a single object, using the preferred query methodology

At some point between inception and having 600 columns someone had to have stepped in said is this necessary? Maybe I haven't worked with big enough data sets, but to me that number seems insanely high - and I really don't see how the ORM would be any worse than raw SQL - if they were equally optimized.

It sounds to me like best practices SQL are being compared with worst practices ORM. Now many ORMs, by default, may lead a developer to use bad practices, which I will concede is a problem.

But in optimizing a query with SQL you only select necessary fields - and don't use any more joins than required for the data you need - you only gain over the ORM if you didn't limit the scope of the object you asked the ORM for and instead asked for the full object.

> Attribute creep and excessive use of foreign keys shows me is that in order to use ORMs effectively, you still need to know SQL.

Well there is your problem! Thinking you can completely ignore the underlying database mechanism of an abstraction layer is a little naive - if you want to maintain maximum performance. Though bad marketing of some ORM's may claim differently which is a problem as well.

I agree, there is a potential downside to ORM's. Raw SQL likely will get you better performance almost 100% of the time. Is that necessary? Well, it depends. If the ORM is being used with best practices, I'd say until you're at the scale of billions of pageviews a month, it'd be negligible. However if you're essentially doing a SELECT * and JOINING on all related tables just to get one field from the Users table - then the SQL will win hands down.

"It sounds to me like best practices SQL are being compared with worst practices ORM."


I've written bad SQL and good SQL. I've written bad ORM stuff, and good ORM stuff. You don't learn how anything is bad until you make mistakes and learn from them. Ideally you learn from others' mistakes too, but some things you just end up having to internalize through experience.

I still take a decent ORM over raw SQL for 90% of the work I do, because much of it is boilerplate/repetitive stuff. Knowing something else will handle escaping and basic relations for me without a whole lot of boilerplate behind it is great, then I write some SQL by hand when I hit a wall with the ORM (complexity or performance). If you're trying to fit 100% of every single data query in to an ORM, and you bend the data too much to fit the constraints of the ORM tool, yes, there will be problems. But that's sort of just common sense - once you feel you're forcing a tool to do something it's not suited for, back up and ask if there's a different way to get your results; doesn't have to mean throwing out the whole tool.

I've worked with plenty enough stupid "raw SQL only" projects to know the real answer to all this is experience and knowledge, vs just following 'one true path' regardless of your ability to understand it. Ever had a user table with 190 columns, named "is_usa", "is_argentina", "is_germany", etc, one for each country, so the developer could determine what country someone lived in? Every single request, 190+ queries: "select is_argentina from user where id=5", "select is_iran from user where id=5", etc. But hey, it was done by hand - no evil ORM to hinder the awesomeness of raw SQL, right?

> I've found myself thinking about the database as just another data type that has an API: the queries. The queries return values of some type, which are represented as some object in the program. By moving away from thinking of the objects in my application as something to be stored in a database (the raison d'être for ORMs) and instead thinking of the database as a (large and complex) data type, I've found working with a database from an application to be much simpler. And wondering why I didn't see it earlier.

Exactly. ORMs feel to me like a result of trying to stick together two different abstractions with a duct tape and hoping it will save us from writing code.

The two abstractions would be:

- #1 Database as "just another data type that has an API", i.e. what is stored in the database and what are your business objects are two different things and should be explicitly separated. ORMs break this by Active Record pattern.

- #2 Database as invisible persistence layer that automagically stores the state of your business objects, in the same way as garbage collector automagically takes care of reclaiming unused memory. ORMs break this by leaking SQL all over the place.

Getting #2 and not having to deal with RDBMS explicitly would lead to a great simplification of code, however we seem to be nowhere near that with current libraries (I heard some rumors that some Common Lisp Metaobject Protocol magic can get close, but I need to see this with my own eyes). So until that time I try to stick to abstraction #1. I do use Active Record, as it simplifies things, but try to explicitly avoid treating ORM objects as business objects.

ORMs are great for simple CRUD operations. As soon as you want to do anything mildly complex or desire efficiency, you need to write SQL.

As long as the ORM helps me with the former and gets out of my way for the latter, I'm totally happy to utilize then.

I agree with this, but SQL is also great for writing CRUD, and 99% of database interaction is CRUD.

That may be true by frequency of query execution, but not by frequency of appearance in an application.

that's absolutely not true.Sure you cant write a procedure with an ORM. But good ORM have query builders that with allow you not to write ANY sql.At all.ORMs are not just about CRUD at all.

The biggest downside to this that I found is that you have to have a lot of trust in the creators of that ORM.

SUre, it will generate a query that gets you what you want - but will it use the right indices? Will use correct ordering for optimal results?

Generally (unless this has changed in more recent years) the answer is 'no'.

Another problem with ORMs is that they make performance diagnostics much harder. DB-side, you might have a list of worst-performing queries and examining it reveals a huge, hundred-line monstrosity of a query. Because the queries are ORM-generated and are not usually very readable it isn't exactly clear which part of the application is generating it (or why). Even further, if your DBA says you could make the query more efficient by changing X and Y, it's not likely that you can take her advice since the query is programmatically generated! So you have the choice of abandoning the ORM in performance-sensitive paths and inserting raw SQL (making the code ugly) or living with poor performance.

I'm a fan of abstractions only when necessary. I want to have control over 100% of the code stack. When performance issues crop up (and they inevitably will), I don't want to just shrug and go "well, Hibernate must be doing something weird..."

This is simply not true, lots of tools are able to tell you exactly which line of code is responsible for which query.

Such as? Using ASP.NET and NHibernate, I've not seen any way to do this other than old fashioned log statements and guesswork.


Worth every penny. You get full stack traces and line numbers automatically with each query.

Not sure about .net since I don't use it but for ruby there are a lot of monitoring services that do this (skylight.io, newrelic etc).

> So you have the choice of abandoning the ORM in performance-sensitive paths and inserting raw SQL (making the code ugly) or living with poor performance.

Most people ask themselves the same question and don't care much about performance. I'm glad you choose performance over code aesthetic. Code aesthetic should only come second to how the system performs. As for the ORM, it has it's downfalls. I've never heard of the N+1 query problem in the plain SQL world. It's a problem brought forth by ORM, because it's just stupid to even think about N+1 when you're trying to accomplish as much as you can in 1 query.

Django makes queries that are quite readable (and optimized! It makes good decisions to reduce the number of queries), and you can see exactly what SQL is run with the Django Debug Toolbar.

So "[ORMs] make performance diagnostics much harder" is just not true. "Some ORMs make performance diagnostics much harder" definitely is.

Personally, I see value in learning the actual "protocol" that the ORM's are abstracting. For instance, it is convenient to use an ORM if you understand that it is just a large abstraction of the actual syntax sent to the database server. People learning ORM's before SQL have a disadvantage though, in my opinion. Because without learning the SQL required, they are tied down to their particular ORM implementation. It is kind of like learning jQuery before JavaScript.

Over and over I keep finding that just an ORM is not enough, but raw SQL is hideous in a different way.

ORMs map nicely when you are indeed modifying objects, but somethings don't map well that way. So don't map them that way! What we need is a low level abstraction layer alongside the ORM.

The main problem with raw SQL is that what you really want is a genuine programming language. You almost want programmatic access to the SQL AST, so you can generate syntax as opposed to concatenate strings together. Kind of like a DOM API, but for SQL.

Let me give you an example from what I'm working on now. So let's say you have a generic query for fetching monthly aggregated import / export values between countries. Sometimes you want to filter by one country. Sometimes you want to filter by 10 countries. Sometimes you want to use the column containing the inflation-adjusted value instead of the regular column. Sometimes you want the average export instead of the total export.

With a low level abstraction layer, I can do stuff like already_complicated_query.filter(another_param==5). Or I can write a function that does get_world_trade_aggregate(country="USA", aggregate="average") and it'll generate the right query for me. But then that's not even all, if I have to modify or filter that query further in some other part of the code, in some unexpected way totally doable. It's less often that I have to write a whole new function or duplicate code.

So the point isn't the abstraction, it's the fact that it makes queries modifiable and composable, which is something you just can not have in raw sql. So given a query, I can decide to add something to it or change it without having to change the text based SQL. You know how it's good practice to split your 1000 line function into smaller, modular pieces? With this you can do that with your ginormous SQL query too. AND it's (well, almost) guaranteed to generate correct and safe SQL.

This also makes everything much more maintainable. E.g. If I change table names or migrate fields, often it's a non-issue and there is a single place I need to change that. But with raw SQL you have to go in and do a ctrl-f and hope you catch everything, but often you don't because someone decided that that field name should come from a variable or something.

Raw SQL spits into the face of everything we've learned about programming languages in the past 60 years.

Anyway, the neat thing is that this magical system exists, as the SQLAlchemy Core (http://docs.sqlalchemy.org/en/rel_0_9/core/tutorial.html). Enjoy!

What I didn't mention in my post was that I did use SQLAlchemy Core to write some pretty complicated queries. It's actually quite good. I like it.

There were some spots that things got hairy though, and the code was pretty hard to follow. I don't fault SQLAlchemy here, but I wrote the query in SQL and it was simpler to work with.

SQLAlchemy is absolutely on the right track, but using the core doesn't diminish the fact that you need to know SQL to use it effectively.

I agree with your assertions that just an ORM is not enough, raw SQL is hideous, and that raw SQL spits in the face of programming language advancements. Sadly, it's the assembly language of databases and, unlike CPUs, doesn't have a good abstraction model.

So you didn't use SQLAlchemy ORM at all, yet you wrote a whole article about how ORMs "don't work", naming SQLAlchemy (strongly implying the ORM) as an example... if so, it would explain why all the complaints you have about ORMs seem to indicate a misunderstanding of the SQLAlchemy ORM ("attribute creep": query for individual attributes or use `load_only()`, `deferred()`, or other variants; "foreign keys": the ORM only selects from the relational model you've created, if your model has N number of foreign keys and the objects you're querying from span M of them, that's how many it will use, there is no "overuse" or "underuse" of foreign keys possible; "data retrieval": SQLAlchemy's Query object maps to SQL joins in a fully explicit fashion, no bending over necessary (see http://docs.sqlalchemy.org/en/rel_0_9/orm/tutorial.html#quer... ); "Dual schema dangers" - use metadata.create_all() in one direction, or metadata.reflect() in the other, the "dual schema" problem is only in systems like Hibernate that don't offer such features (and actually it does, just not as easily); "Identities" - manual flushing and hand-association of primary key values to foreign keys is not necessary, use relationship(); "transactions"- ORMs don't create this problem and only help to solve it by providing good transactional patterns and abstractions).

I'd appreciate if you amend your article to clarify that you only used SQLAlchemy Core, if this is in fact the case. Your key point that one needs to know SQL in order to use an ORM is absolutely true. However, the value of the ORM is not that it hides awareness of relational databases and SQL; it is in that of automating the task of generating database-specific SQL as well as that of mapping SQL statement execution and result sets, specific to the database driver in use, to object-oriented application state, and keeping these two states in sync without the need for explicit and inconsistent boilerplate throughout the application. I discuss this in many of my talks (see http://www.sqlalchemy.org/library.html#talks).

If you worked in soda bottling company, you probably still know how to fill a bottle of soda by hand. It's the complex machinery that does this automatically which allows this task to scale upwards dramatically. Configuring and using this machinery wouldn't make much sense if you didn't understand its fundamental task of putting soda in bottles, however. The situation is similar when using an ORM to automate the task of generating SQL and mapping in-application data to the rows it represents. Removing the need for "knowledge" has nothing to do with it. The goal instead is to automate work that is tedious and repetitive on a manual scale.

The OP wrote: "What I didn't mention in my post was that I did use SQLAlchemy Core to write some pretty complicated queries."

What you seem to have read: "What I didn't mention in my post was that I did use SQLAlchemy Core to write some pretty complicated queries and didn't use SQLAlchemy ORM at all."

It seems to me that it would be more plausible to read: "What I didn't mention in my post was that I did use SQLAlchemy Core to write some pretty complicated queries, in addition to using SQLAlchemy ORM."

>The main problem with raw SQL is that what you really want is a genuine programming language. You almost want programmatic access to the SQL AST, so you can generate syntax as opposed to concatenate strings together. Kind of like a DOM API, but for SQL.

What you described is basically EF+LINQ.

    var query = context.Users.Include("Users.Group").Where( u => u.UserType 
    == UserTypes.Basic).Select( u => u.Group);
That's going to spit out a SQL statement that joins Users to Group and selects all groups where the UserType is UserTypes.Basic. You can even do ToTraceString() to see the generated statement.

> You can even do ToTraceString() to see the generated statement.

And then if you're a control freak, feed that generated query into ScriptDom [1].

[1] http://msdn.microsoft.com/en-us/library/microsoft.data.schem...

The same in Scala + Slick would be:

    val q = Users.filter(_.userType == UserTypes.Basic).map(_.group)
Although I'm not sure what the context.Users.Include bit in the EF example is doing. Both will generate statement at compile time, and I assume EF queries are composable as is the case with Slick.

Slick's readability does suffer though with more complex queries -- unfortunate they strayed away from SQL semantics in favor of Scala collections. We'll see where it goes...

Include() on an ObjectContext will return the corresponding foreign key objects. It's a poor man's join to related entities. There is also Join(), but that will join by anything.

Ah ok, I see. Slick actually does have a nice shorthand for fkey joins:

    val q = for{
      ur <- UserRole; u <- ur.user; r <- ur.role
    } yield(ur,u,r)
which would produce semantic equivalent to:

    select ur.id, ur.a, ..., u.id, u.a ...
    from userrole ur
    join user u on u.id = ur.userId
    join role r on r. id = ur.roleId
grouping and sorting is where Slick becomes less readable SQL-like DSL and more boilerplate DSL full of meaningless tuples O_o

> The main problem with raw SQL is that what you really want is a genuine programming language. You almost want programmatic access to the SQL AST, so you can generate syntax as opposed to concatenate strings together. Kind of like a DOM API, but for SQL.

Congratulations, you just described Arel.

I liberally use rails/active_record where it shines (operating on a single record, or writing composable scopes) but very often find myself leveraging Arel (accessible via YourModel.arel_table) to generate a carefully crafted SQL AST.

At my current job we have two gems leveraging this power: "Massive Record", allowing one to perform bulk operations (insert, upsert) on huge lists of records in an efficient way (instantiating hashes instead of full-blown ActiveRecord instances), and "Chains", which allows one to handle authorization at a per-entity or per-record level with a iptables-like system. There is no way the generated queries could be sanely written by hand, nor as concatenated strings. Arel allows us to build highly dynamic queries while still tuning for performance. Database independence comes as a bonus, and we can easily extend Arel with more nodes, possibly some database specific ones that get selectively added depending on the configured database.

Arel also allows us to write clean and efficient database migrations, where we basically use your typical ActiveRecord faux models merely for datatable reflection to obtain column names and types.

[0]: https://github.com/rails/arel

> Congratulations, you just described Arel.

Unfortunately SQL leaks through Arel's abstractions and make it behave in surprising ways. Arel falls short of achieving the goal of being able to modularise and compose queries. The same is true of nearly every SQL connectivity library (or so this article claims: http://www.try-alf.org/blog/2013-10-21-relations-as-first-cl...).

> Unfortunately SQL leaks through Arel's abstractions

This is voluntary. Arel allows one to write SQL almost as is, only with native types and features that make it possible to compose and aggregate queries. It's not a 1:1 mapping but it's not a blackbox abstraction either since the goal is to finely control your SQL. Of note, Arel is neither ActiveRecord nor ActiveRecord::Relation (which is what you get when you use #all, #where and scopes, sadly often mistaken for Arel)

The article you linked to is wrong, Arel is perfectly able to compose projections and joins, see this gist[0] implementing what the author demands.

[0]: https://gist.github.com/lloeki/2bc0ece35b2ba42b681d

>The main problem with raw SQL is that what you really want is a genuine programming language.

In my experience, if you take a data-oriented approach (where you add value through structuring/modelling data), SQL is just fine. If you take only an application-oriented approach (where you add value by making use of data), you may be tempted into viewing the database as a dumb object store, and SQL starts to creak.

Having said that, in the cases where I have wanted composable SQL, SQLAlchemy has done a fine job.

>With a low level abstraction layer, I can do stuff like already_complicated_query.filter(another_param==5). Or I can write a function that does get_world_trade_aggregate(country="USA", aggregate="average") and it'll generate the right query for me. But then that's not even all, if I have to modify or filter that query further in some other part of the code, in some unexpected way totally doable. It's less often that I have to write a whole new function or duplicate code.

If you're working in Ruby, there's also Arel[1]. I have my complaints, but it seems to be the exact same thing: Composable, programmatic access to SQL from your application code.

[1] https://github.com/rails/arel

If you are working with Ruby Sequel is a way better tool for building SQL queries.


If you are using Java, I suggest taking a look at JOOQ. JOOQ implements SQL statements as an AST:


I've been using JOOQ lately and liking it a lot.

Take a look at my library, swigql[1]. You can create a base query and then extend that query however you want via template inheritance.

[1] https://github.com/civitaslearning/swigql

My personal opinion on such libraries is that they're cool, and I think they do make life easier but they're just a more advanced form of string concatenation. No notion of SQL types, syntax or structure. To follow the template analogy, a low level abstraction library allows me to not only change the variables but also change the template programmatically. And not by concatenating more stuff onto the template either :P

Prepared statements, by contrast, are also a form of string concatenation. And they have some understanding of type and syntax, but only because they learn them from the server:



> Prepared statements, by contrast, are also a form of string concatenation. And they have some understanding of type and syntax, but only because they learn them from the server:

Prepared statements are NOT string concatenation. The server parses the SQL, creates what internal representation it does, and then uses the parameters more as arguments to that executable statement than checks them and sticks them in. At no point does a SQL statement exist with the parameters and SQL.

> You almost want programmatic access to the SQL AST, so you can generate syntax as opposed to concatenate strings together. Kind of like a DOM API, but for SQL.

I think this is the appeal of MongoDB's driver on Node: You really do have programmatic access to the AST, insofar as the microlanguage is just a plain old Javascript object.

Though SQL is more universal, Mongo's approach definitely has thought hard about the balance between abstract and concrete that me and my other developers find very intuitive. So intuitive that we use their microlanguage way outside applications in ORM, like for sorting, filtering and object updates.

>Though SQL is more universal, Mongo's approach definitely has thought hard about the balance between abstract and concrete that me and my other developers find very intuitive.

I think you've got the causality and conclusion backwards here. MongoDB's easy programming API is a consequence of its storage layout on disk -- the JSON/BSON bytes on disk. From that principle, you naturally get an "ORM" type of API exposed in the programming language basically for free. I'm saying you don't have to do a lot of theoretical computer science type of research and pondering to get a disklayout+API that looks like that. In fact, the early 1960s mainframes laid out data records as fully denormalized (similar to JSON/BSON) before the relational DBs became popular in the 1970s.

But, if your app starts expanding into complex data that's not 100% embedded as fields within one document, you start needing to do "relational" type of joins across documents. And those joins will manifest itself as extra programming code/logic on the client side. You're still paying for added complexity. You're just paying in a different way from the ORM+RDBMS programmers.

Many MongoDB projects want it to perform more like relational. Likewise, many RDBMS want it to act more like an object hence ORMs. The impedance mismatch looks like irreducible complexity.

You can do this with jOOQ (http://www.jooq.org), provided you're actually creating the original query using jOOQ...

Unlike MongoDB, the DSL really resembles SQL, which makes it far easier to read / write queries for someone who is accustomed to SQL

Have you not heard of PL/SQL? or its equivalent for other variants of SQL.

> Kind of like a DOM API, but for SQL.

In Java land, the big persistence API is JPA. It's an API for ORMs, but it has features which go a long way towards this - just from the ORM side rather than the SQL side.

Firstly, it has its own query language, JPQL, which is basically a large subset of SQL (on the order of MySQL's SQL rather than PostgreSQL's, eg no window functions) with some added niceties, such as for joining (eg "select u from User u where u.manager.location.country.code = 'UK'").

Secondly, it has something called the 'criteria API' [1] which is basically a DOM for JPQL. So not SQL, but close. I worked on a large, byzantine financial application where we did a lot of building up of queries using this API. It worked pretty well as a way of separating the logic for different aspects of queries: we had one bit of code that set up the core of a query to limit results according to the user's permissions, another bit that would apply temporal filters, another bit that would apply subject filters, an optional bit that would take a query for trades and turn it into a query to find stocks involved in those trades, etc.

The API itself is not massively typesafe (the programmer specifies an attribute with a string name and type, so they can get both the name and the type wrong if they like), and it is rather verbose (of course). There is an extension called the 'metamodel', where you can automatically generate classes which have ready-made bits and bobs which describe your schema, and saves both typing and errors. Using that, a query to find all trades on a particular stock could look like:

        @PersistenceUnit EntityManager em; // someone will dependency inject this

        CriteriaBuilder cb = em.getCriteriaBuilder();
        CriteriaQuery<Trade> q = cb.createQuery(Trade.class);
        Root<Trade> trade = q.from(Trade.class);
        q.select(trade).where(cb.equal(trade.get(Trade_.stockTicker), "NXJ"));
        TypedQuery<Trade> = em.createQuery(q);
Which is is still verbose and stilted in the grand traditions of High Java, of course.

There are some third-party libraries for building JPQL strings which can be used for much the same purposes. Liquidform is quite good:

        Trade t = alias(Trade.class, "t");
        SubQuery<Trade> q = select(t).from(Trade.class).as(t).where(eq(t.getStockTicker(), "NXJ"));
        TypedQuery<Trade> = em.createQuery(q.toString(), Trade.class);
A key thing here is that the alias function takes a class and, via bytecode sorcery, produces an object which conforms to the type of that class, but which isn't really an instance of it, but instead has methods which act as query primitives. It works a lot like things like Mockito and so on.

The new hotness, though, is Jinq, which uses Java 8 lambdas to imitate the great .net LINQ. I haven't used it, but i believe the query would look something like:

        List<Trade> = streams.streamAll(em, Trade.class).where(t -> t.getStockTicker().equals("NXJ")).toList();
Again, this uses bytecode sorcery under the hood, but this time, it does it by pulling apart the lambda to determine the condition it expresses, and then constructing a corresponding query string. A different knid of sorcery - in AD&D terms, divination rather than conjuration.

[1] http://docs.oracle.com/javaee/6/tutorial/doc/gjitv.html

A good ORM is not a substitute for SQL. They help you with mundane things and you can still write SQL if you want. I like the approach of RedBeanPHP (www.redbeanphp.com).

+1 for redbean - most of my colleagues hate it, but I've found it's a good balance between getting rid of a lot of boilerplate, but not getting in the way when I need straight SQL.

Came here to say this. RedBeanPHP is a great ORM with several advantages over stuff like Doctrine:

* it doesn't require you to learn new language. The API is very simple

* it encourages using SQL when needed. Need to add a special WHERE query? No problem. You want to make a custom select? Just use R::get() method

* it doesn't alter your scheme more than it should. There is only one rule - linked table names (m:n relations) use an underscore. That's it.

I wrote "raw" SQL for many many years before using ORMs, so I feel a lot of this guy's pain and agree with most of his points. Especially the part about still having to know SQL even though you're using an ORM.

Not sure I understand his solution to this one, though!

  Window functions are relatively advanced SQL that is painful 
  to write with ORMs. Not writing them into the query likely 
  means you will be transferring a lot of extra data from the 
  database to your application.

  In these cases, I've elected to write queries using a 
  templating system and describe the tables using the ORM.
Templating system? Whaaaat? Here's what I do in ActiveRecord + Postgres and/or SQLite. (Edit: OK, I think I understand! Other people in this very HN thread have discussed templated approaches, like https://news.ycombinator.com/item?id=8134170 )

1. I stick to ActiveRecord whenever possible for the simple stuff. It's really good for like 75% of what I want to do. Being able to chain scopes and things is neat and a good example of something you can't easily do in SQL.

2. When I need to work some SQL magic that ActiveRecord can't do (or can't easily do) that's when I write a SQL view or function. So my class might be something like:

  # Example is Ruby+ActiveRecord but concepts should be applicable elsewhere
  class SomeClass < ActiveRecord::Base

    def some_simple_method_that_uses_activerecord
      SomeClass.where(foo: bar).order(:baz)


    def some_method_that_needs_sql_magic
      SomeClass.find_by_sql "select a,b,c,d,foo,baz from some_crazy_view_or_table"
This has been working pretty well. For the simple CRUD stuff that ActiveRecord is good at, I use ActiveRecord's built-in query interface. For the other stuff, it's easy to write a SQL view/function and simply reference that. It's not perfect because obviously you risk having too much of your app logic live in the database layer... although, then again, I feel equally icky about having too much of the data logic live in the app layer.

Let's talk practicality rather than sitting in some ivory tower and muttering about best practices

With AR, if this is the worst case where you write some raw SQL, what's the alternative? The alternative seems far more painful pragmatically speaking and this, while being a little ugly, seems to work just fine without impacting productivity or performance.

  > With AR, if this is the worst case where you write some
  > raw SQL, what's the alternative? The alternative seems
  > far more painful pragmatically speaking and this, while
  > being a little ugly, seems to work just fine without
  > impacting productivity or performance.
Do you know if other ORMs allow this kind of relatively painless use of "raw SQL?"

I've only used ActiveRecord and some of the .NET "micro ORMs" like Dapper and PetaPOCO.

Dapper and PetaPOCO definitely support raw SQL. That's kind of their main focus - they take a row of SQL results and map those database columns to your class's properties, and by design they don't do a whole lot else.

But I don't know about other big/popular ORMs like Hibernate...

Even the bulkiest ORMs allow you to use raw SQL. That's why you can use 80 - 90% of the features on the regular basis and hand-tweak regions which cause performance problems or places where you just have to write SQL (e.g. recursive queries).

In EF, there's either: Database.SqlQuery<T> - http://msdn.microsoft.com/en-us/library/gg696545%28v=vs.113%...

which can return any object or: DbSet<T>.SqlQuery - http://msdn.microsoft.com/en-us/library/gg696332%28v=vs.113%...

which returns tracked entities, so you can write raw SQL (e.g. call a stored procedure) and just use the 'mapper' part of the framework.

NHibernate has CreateSQLQuery - http://www.nhforge.org/doc/nh/en/#querysql-creating

I like micro-ORMs, but when you want to skip writing tedious INSERT or UPDATE queries, you have to add extensions to them (at least to Dapper); that, and SQL strings do not really lend well to refactoring and type safety...

Oh man, that's great to know. Thank you for that reply!

No problem! Happy to help :)

ORMs get a lot of flak and while some of it is truly earned, the rest comes from the misuse/abuse of the tool. I always thought that using ORM functionality (where convenient) together with SQL (where necessary or convenient for different facet of the application) was the best from both worlds.

And then you can of course mix different ORMs in one project, so you can use EF in areas where performance does not really matter that much or if you're doing a lot of CRUD and Dapper (or something like Insight.Database if you like stored procedure-to-interface mapping) in hot paths or analytic-heavy piece.

In Django, for me the killer feature of the ORM is that it's (mostly) database agnostic, which means that you can use Postgres in production and in-memory sqlite when testing, which makes testing a gajillion times faster.

If you start writing custom SQL you have to introduce horrible bodges to work with whatever database is in use.

That makes testing faster, sure, but then when you deploy your app to production you can get lots of weird errors because SQLite is weakly typed and Postgres is strongly typed, for example. Using the same DBMS in dev and prod is a best practice for this reason, even if it makes your test suite run slower.

Database portability is a poor argument for using ORM in my opinion.

In the real world, most apps don't change database engines during their lifetime. You wait for the rewrite and then you pick a new engine.

If you stick to ANSI compliant SQL, you should be fine when porting over databases. It's not perfect but it'll get you most of the way.

> In the real world, most apps don't change database engines during their lifetime.

Perhaps not, but a lot of times you may have to support multiple database engines at once (this is more true of e.g. middleware than an end-user application obviously).

> Database portability is a poor argument for using ORM in my opinion.

Especially since SQL abstraction layers exist that are not also ORMs (e.g., Sequel for Ruby -- which, yes, has an ORM included, as well.)

> If you stick to ANSI compliant SQL, you should be fine when porting over databases.

This would be true if most real RDBMSs implemented a superset of any particular ANSI SQL standard, but that's not actually the case. Most real SQL-based RDBMSs support different subsets of the standard plus different extensions, so its not all that common that non-trivial code ports well without conversion.

I like this in theory, but in practice I have found that the differences between databases make this not work as easily as it should. I often end up seeing subtle behavior differences when using different DBs, which can be problematic when something works fine in development and fails in production.

That's supposedly to be the ideal case. The problem is that in reality, once you get to the more advanced features and each SQL differs each other slightly, it's anything but agnostic (if it's even there to start with).

Take UPSERT for example, Postgres doesn't have upsert until lately (I'm not actually sure if it's in stable yet), and the upsert's equivalent of MYSQL is INSERT ... ON DUPLICATE KEY, which behaves different than a normal upsert. The ORM solution? Don't implement them at all. In the end, you often end up with a mixture of bizarre ORM code + SQL.

The best compromise I've seen is probably Korma for Clojure (http://sqlkorma.com/). It's not really an ORM, since object mapping isn't particularly necessary in clojure (with map manipulation and all that), but still provides you with some stuffs of normal ORM (implicit relational joins). And yet still, many times once you get to the truly rarely used part of the db ... there is nowhere but raw SQL query to go.

Have you run your (mostly) database agnostic app at a large scale?

When your app is running at scale, did you require any custom SQL or non-standard to the ORM stuff to get it running well?

In my experience with Rails, the "database agnostic" features are for prototyping, and by the time you're a real app moving towards some idea of scale, there is nothing "agnostic" about your ORM and database code...

A fair point, I might be unduly influenced by having done a lot of work on a huge, unwieldy internal app. Running the tests in postgres took literally half a day, and there's no way we could have worked like that.

In that case trading some db efficiency for actually being able to do work was a pretty easy decision, but horses for courses I guess.

I've combined Postgres views and set returning functions in rails and had a lot of success with good performance and good maintainability, too. No custom SQL required in the application itself.

I definitely agree with the crux of the article (that it's important to know SQL if you want to program with an ORM), but not for the same reasons as the author. The reason I agree that it's important to know SQL is that not knowing SQL will in some ways limit what you can do with your applications. This may never affect you based on your use cases, but there may be times when it does, and not knowing how to write queries will lead you to believe that things can't be done with your application that could be.

In the article, it sounds like he's building some reports from a database with a framework, and is challenged because there are massive amounts of data that work best for these specific reports in an unnormalized layout (this is where the attribute creep comes into play). If you use an ORM to try and build these reports on the app side as opposed to the database side, then you will struggle.

However, my take on this is pretty simple. Don't build the complex logic in aggregating the report data on the app side. Use materialized views on the database side, and use your app to do a dumb retrieval of the data from the view with the ORM. The complexity will lie in the materialized views, so your ORM queries can actually still be quite simple and your performance won't suffer.

Obviously, this route would require knowledge of SQL to implement, which brings me back to my original point. ORM developers should have an understanding of SQL.

Long story short, for specific use-cases, it's good to know SQL. For most it doesn't really matter.

I think the problem is when ORM influences/encourages particular schema designs. When you no longer see tables as tables (which is storage) and rather see your database tables as instances of objects (how you would like to consume the data)

ORM (rails/AR in particular) makes it very diffcult to work with joins and build an object that read from multiple tables.

One workaround I think is to use database views. And see views as "instances of objects" and back ORM classes with them.

Java's JPA interface, of which Hibernate is an implementation, makes it pretty simple. Straight from the documentation of EclipseLink (another implementation):

  // query for a primitive
  Query query = em.createNativeQuery("SELECT SYSDATE FROM DUAL");
  Date result = (Date)query.getSingleResult();

  // query for a pair of primitives (bletcherous, but not disastrous)
  Query query = em.createNativeQuery("SELECT MAX(SALARY), MIN(SALARY) FROM EMPLOYEE");
  List<Object[]> results = query.getResultList();
  int max = results.get(0)[0];
  int min = results.get(0)[1];

  // query for a mapped object
  Query query = em.createNativeQuery("SELECT * FROM EMPLOYEE", Employee.class);
  List<Employee> result = query.getResultList();

I hardly use an ORM anymore these days, however this article has a few issues.

First even when using SQL the author runs into the problem where splitting the database into two and having a reporting database would be much more efficient instead of having one database trying to meet all your needs. When you do writes you want transactions and 3rd normal form but when reporting 3rd normal form becomes a downside. This applies to attribute creep and data retrieval.

Second the dual schema problem is one that I think most ORM users know how to avoid. I generate the schema from the code directly, maybe with a little bit of fluent migrations to help move data.

The issue with transactions is a strange one. Ideally this is handled as a cross cutting concern in your application. This means it's consistent and transactions can be explicit and predictable. I'd do the same thing for any application.

The biggest issue here is I think the author has chosen the wrong tool. This application sounds like it would be well suited for event sourcing. I'm not going to go into it here but they solve these issues in an interesting way. Plus the data is event-based anyway.

My favorite pattern, which I use on almost all personal code is the phrasebook, implemented here in perl: http://search.cpan.org/~rani/Class-Phrasebook-0.88/SQL/SQL.p...

Basically it's "give your SQL a symbolic name, define how your code interacts with that SQL, and use it as such."

Mind you, ORMs can be awesome, but in my experience ORMs are fantastic at implementing the parts of an application that aren't the competitive advantage of your application. The parts that actually make my code special, the parts where fundamental assumptions regarding codes interaction with data fall apart... those are the parts where ORMs don't particularly shine, especially as time moves forward.

For those of us on Java, I've grown to love http://jdbi.org/. JDBI has a lot of features that are convenient (e.g. auto-mapping of columns to a POJO), but synthesizes DAOs for you from interfaces annotated with SQL queries, e.g.

  public interface TripDAO {
      @SqlQuery("SELECT trip_start AS start, MAX(timestamp) AS end FROM location_updates WHERE trip_start = :start GROUP BY trip_start ORDER BY trip_start DESC")
      public Trip trip(@Bind("start") Date tripId);
And allows you to plug in providers for mixing in Optional types, PGgeometries, whatever you can fit into an interface.

I've been using JDBI for a new project for about a month or so and I like it a lot as well. I will say that the documentation is a bit sparse though, or at the least hard to find by digging through blog posts. When I figure out how to do what I'm trying to do however, I love it.

For example, it took me a while to find out that @CreateSqlObject was a thing and that it solved any hackery I was trying to do to have my DAOs reference one another. Or if you google "jdbi transactions," you aren't led to a page that actually shows you how to use @Transaction. I feel like the only thing holding back widespread use is some better documentation.

I'll agree with that. I had to really dig through the user group to find answers to a lot of things. e.g. @SingleValueResult if you want to return an Optional<Item> from a query.

I didn't know about @CreateSqlObject, thanks!

Been using http://commons.apache.org/proper/commons-dbutils for the same purpose. Works well when I don't need the slede-hammer a full ORM-framework can be. Will look into JDBI as well next time.

I've found the nirvana with MyBatis https://code.google.com/p/mybatis/

I've benchmarked it and it adds roughly a 3% on top of raw JDBC, and allows different styles of usage. You can have your pojos annotated and get mapping for free, or (what I like) you can extract your SQL queries in XML files, name them and refer them from code with sql.insert("namedQuery", params);

It is super smart when it comes to mapping/aggregations and it is highly extensible to allow the transparent use of custom mappers and logics. Highly recommended.

I used mybatis at a former company. I liked it, but on the spectrum of ORM-like frameworks, it had just enough non-code configuration behind it that junior developers, or developers who aren't working in that type of code very often, struggle with how to get new data objects persisted. In my opinion, this stemmed from using XML files to define the mappers and queries. Most people just never knew where to look for that stuff, or exactly how it should be used.

In the end, I try to pick the option that is easiest to read and understand. I've had a terrible time with JPA/Hibernate (I hate that the ORM ends up affecting your schema at all), I had a good experience with Mybatis, and so far I'm really liking JDBI (although see my other comment here about documentation). I don't want raw JDBC, but JDBI seems to be the right tradeoff of power/ease of understanding. I think that the annotated use of mybatis would get you this as well, but I haven't used it.

I have also used MyBatis in few projects and recommend it for applications that are database centric. I would use ORM only if the requirements call for object persistence and be DB/SQL agnostic.

"I much prefer to keep the data definition in the database and read it into the application. It doesn't solve the problem, but it makes it more manageable. I've found that reflection techniques to get the data definition are not worth it and I succumb to managing the redundancy of data definitons in two places."

My experiency is almost the oposite of that. I've found that automatic migrations are one of the best features of ORMs (only on par with getting reflection on the data structure), and the OO syntax much better for defining data structure than DDL.

Nowadays I favor ORMs even when I don't use the OO data interface, just because it's easier to maintain.

Do people actually use ORM-generated migration in large production environments? In my experience, schema mods to big production tables need to be planned out carefully and run by hand on the inactive master db (using mysql mmm replication terminology) in order to pull them off without downtime. You can't just have the next app push blindly start doing ALTER TABLE statements. Perhaps there are clever schemes modern ORMs use to package up the migrations to be run on the inactive master? I honestly don't know, but I had the impression that automatic migrations via the ORM were more or less unusable for anything nontrivial.

To be honest, I never tried migrating without downtime[1].

I don't see any reason why applying the changes in a inactive mirror would be dificult. They are fit for diverse environments, since you'd apply them first at the development machine, then testing[2], and only then at the live system. The main issue is what to do with the data generated while you migrate, but you solve that the same way you solve a normal hand-made migration, you ask the ORM to create the operations at the right order, and divide it in batches. It's just more work than a couple of minutes of downtime justified for me.

[1] Postgres has no problem doing some ALTER TABLE statements at the live mirror, what makes most migrations trivial, ORM or not. I mostly do those live, but monitor closely because it's dead-lock prone.

[2] I think data migration is the only reason I create separated testing environments nowadays. Otherwise, I try to keep development envs complete enough for tests, and just recreate them as needed. It makes everything much faster.

A thousand times this. Especially since somewhat common operations for DDLs are vendor-specific, such as adding certain types of indexes, and ORMs just abstract that away very nicely.

Hibernate can almost be used as the definition of pernicious


"causing great harm or damage often in a way that is not easily seen or noticed"

The most pernicious thing about Hibernate is the "caching feature" (read as "time-bomb") layer that doesn't write objects right away when modifying an object. So instead of immediately seeing changes in your database, they gradually creep in from other application instances as the write-timeout expires, slowly corrupting your data without any errors occurring. Great idea guys.

So true!

With Hibernate there are a lot of things happen under the hood. And all you can do is either struggle with Hibernate, or hope that those under-the-hood-things won't have a strong impact on your app's performance.

I agree with Woz. Any developers working with RDBMS should learn SQL. ORMs are just wrappers for SQL statements. They are not the same thing, and knowing one without the other means your skills are weak.

Many comments in this thread are painful to read.

Learn to recognise anti-patterns and deal with them ASAP: http://pragprog.com/book/bksqla/sql-antipatterns

Also, learn about red/green/refactor. Refactor mercilessly. Learn both ORM and SQL, understand the technology you're working with.

The "wide tables" problem is easily addressed through "associated data" tables. Don't store everything in the one model. You have one model which is for computation, searching, comparison, etc. You have another model which is for storage of stuff that is only required at report time. This is a ORM-level of "partitioning". Just avoid "fetch-related" style ORM requests and you're golden.

It's really worth reading whatever references are available on tuning your particular database for performance. You can take the advice onboard and let the performance advice contribute towards your ORM design.

The SQL-templating approach is tempting, but it will very quickly lead to problems as someone adjusts a template and adds or removes parameters. Do you have tests in place to ensure that the SQL templates still function as expected? How tightly bound is your SQL to one particular vendor's product? Been there, done that, bought the t-shirt. Did you know that MySQL will automatically update the first TIMESTAMP column any time that row is updated? It doesn't even ask!

The non-standard TIMESTAMP behavior is deprecated btw, and can be disabled via --explicit_defaults_for_timestamp:


The worst part about ORM is that there are so many systems for doing it. Out of six employers, two had their own handrolled ORM systems.

I think what's going on here is that elementary mapping of objects to and from relational tables is really easy and obvious, so people go ahead and implement it. Unfortunately once you get past the easy bits and have to worry about breaking objects across tables, foreign keys, transactions, caching, and other fun stuff, you have a great big mess and you probably messed up some of the edge cases.

The main problem with ORM's is that they keeps the database model in the code rather than in the database. I know that is what some people like, i myself USED to think that was nice.

What i eventually learned was what Linus Torwalds said: http://programmers.stackexchange.com/questions/163185/torval...

As long as your data structures are good and clear the code to handle them seams almost obvious.

Here's the quote: "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

Wow, thanks for posting the excellent link.

> for me, ORMs are more detriment than benefit

Well, that's not a great start to the article. There are dozens of different ORM's, so unless you get more specific, this makes as much sense as saying "For me, programming languages are more detriment than benefit".

All ORM's are not equal and they all operate at various levels of abstraction. Some are very close to SQL, others sit very high on the abstraction chain, and each project might benefit from a different one based on a lot of criteria.

This is why I love Dapper micro-ORM in the .NET world. It gives you the best of both worlds.

You are still close to SQL, but you get objects back. Started by Sam Saffron (StackOverflow) and used by StackOverflow themselves, it is fast, well written, concise and easy to use.

Basic usage:

   IEnumerable<Account> resultList = conn.Query<Account>(@"
                    SELECT * 
                    FROM Account
                    WHERE shopId = @ShopId", 
   new {  ShopId = shopId });
Performance of SELECT mapping over 500 iterations - POCO serialization:

   Hand coded (using a SqlDataReader)	47ms
   Dapper ExecuteMapperQuery<Post>	49ms
   ServiceStack.OrmLite (QueryById)	50ms
   PetaPoco				52ms
   BLToolkit				80ms
   SubSonic CodingHorror		107ms
   NHibernate SQL			104ms
   Linq 2 SQL ExecuteQuery		181ms
   Entity framework ExecuteStoreQuery	631ms
- https://code.google.com/p/dapper-dot-net/

- http://www.tritac.com/bp-24-dapper-net-by-example

- http://en.wikipedia.org/wiki/Dapper_ORM

- http://code.google.com/p/dapper-dot-net/source/browse/Tests/...

These annoyances are familiar enough, but I think that the author is underselling the benefits of this kind of abstraction. For one project I'm currently using an ORM (NHibernate) with several patterns depending on the requirements and the context:

- With single-table queries and simple joins, having a repository available means that these are a joy to write and maintain with LINQ. And with some preparation the mappings can prevent all sorts of runtime issues.

- Complicated queries which might be tricky to generate optimized SQL for can be written in HQL, meaning that the data access code does not have to be coupled to one brand of persistence store.

- Really quirky stuff that only works in Oracle, for example, can be written raw with CreateSQLQuery(). Here the ORM is basically functioning as an extremely bloated DB session manager, but this is not very often.

I definitely agree that familiarity with the database query language is needed to build non-trivial applications, but at the same time I would not enjoy building many of those applications without the option to query a table in one line of type-safe code. Some folks might say that I'm not using all the features of the ORM unless I have every single foreign key relationship mapped out to lazy-loaded collections and such, but I'm not really comfortable enough with the technology to want to do that, and I don't really mind the extra bit of housekeeping required to handle those cases.

Don't get me wrong, if you must prioritize raw performance or need all the flexibility of your query language, the right decision might be to ditch your ORM. With my current project, stability and maintainability are much more important, so I'll be sticking with the convenient tools for now.

Use the right tool for the job. If accessing a meaningful subset of your data requires 14 joins and returns 600 attributes, your data model probably isn't going to be friendly to many ORMs. Just write the SQL, maintain the sprocs and tvfs, and give up on trying to fit everything neatly into an object-to-table mapping.

But if you're tired of rewriting the same, CRUD-like SQL over and over again, and want a common framework for accessing similarly structured data, you'll inevitably write an ORM (however minimal) and then you're facing the same problems that every ORM has attempted to solve to date.

The biggest danger in using an ORM is that it will constrain how you structure your data model. It took me a long time to become comfortable with using (materialized) views to back my models, or using multiple models for a single table. But being able to divorce my data model from my conceptual model has helped me immensely, in terms of avoiding a lot of the problems the author is talking about. Rails' ActiveRecord is surprisingly friendly when it comes to these sorts of things.

Like anything, you always need to learn at least what is going on one layer below the abstraction layer you are working in. So if you are using an ORM you need to learn SQL. If you are using ruby or python you need to learn what the interpreter is actually doing. If you are using C you need to know what the machine code is doing, etc.

Well I love SqlAlchemy, and DBIx::Class

I also like true RDBMS.

The most important feature of RdbMs is the Relationships for me and the «on delete/on update» combined with FKI and constraint. It makes possible to have complex in relationship that are consistent and that can be added/removed without destroying the consistency of the state in the DB. And most ORM only people I know overlook this part, which is a must for transactionality. Your data stay consistent other time.

Also, people tend to think querying with ORM are cool. They often query DB as random access record. However for me complex query are more like intersection, symmetric difference, union of records based on their joins. And SQL is so good at it.

ORM can have hook on transactions, making it possible to call a webservice on commit and rollback.

ORM can have mixins to help with having password handled carefully.

So, what is wrong at my opinion, is neither SQL nor ORM but the idea that only one is enough to learn.

The problem with raw SQL queries is that they don't compose. Using an ORM, I can do two things.

1. Pass around query objects, and build queries derived from others. I can also combine multiple queries into one and split the results.

2. Update model records in multiple places passing them through several layers of business logic before serializing.

This is on top of the other obvious benefits of ORMs, such as abstraction over my storage engine. I can write a single 'query' that be be executed against a variety of SQL servers, Salesforce, MonoDB, an in-memory cache, or whatever else I want to do with it.

As a real-world example of why this matters - On my current project, I have a heuristic algorithm that allocates pending jobs to workers that meet the required skillset. As part of this, I have individually defined rules that must be met. As part of the operation, each rule can query the database for relevant results.

Each rule is a standalone module, and by design cannot have any knowledge of other rules. Rules can be written by third parties and are plugged in and loaded at runtime. To make this work, we can either

1. Run one query per rule, hitting the database more than needed 2. Compose all the rules into one query, hitting the database once.

Using an ORM, I'm able to take a base query and fold it through all the available rules, allowing each one to return an updated query to be merged. Some rules conflict - they may require results that another rule has filtered out. To solve this, the ORM will automatically detect the conflict and build a single 'Query' object that compiles to a broader query behind the scenes, takes the results, stores them in an in-memory cache, and then runs the individual rules in-memory against them to get a final resultset. In the worst case scenario where this is not possible, it will compile to the minimum possible number of SQL queries to satisfy all of the individual callers.

As a result, each rule can run a different query against the database, getting the individual resultset it wanted, while not hitting the database so heavily.

Why not just query multiple times? In this case, we're running this against a Salesforce database. On top of the fact that you pay per API call, there's anywhere up to 2 seconds of latency before getting a result. Composing the queries means we take an operation that might have taken a few minutes and used a lot of expensive API calls into an operation that takes a few seconds and uses 1 API call.

At the end of this I get a resulting immutable object. I can perform business logic on this in multiple places, accumulating changes, at the end of which I have an `update` object containing an original version and a delta. I can then just update the things that actually need updating, and can run additional business rules intelligently based on which fields were changed. If there are multiple records that need updating, the ORM will batch them for me to further reduce API calls.

Using raw SQL, it would be possible to implement a rough hack that approximates this, but it would be nowhere near as efficient or scalable.

> this is on top of the other obvious benefits of ORMs, such as abstraction over my storage engine. I can write a single 'query' that be be executed against a variety of SQL servers, Salesforce, MonoDB, an in-memory cache, or whatever else I want to do with it.

This is a trade-off, not an obvious benefit. In programming to a lowest-common-database API, one loses the ability to use any actual features of the specific database technology being used. It would be very interesting to know what proportion of projects need to smoothly change the underlying DB technology (it is of course debatable whether ORM's actually let you do this), vs what proportion find themselves hampered by the less-powerful-than-SQL database manipulation API offered by an ORM.

Here's a question. WHY do we even bother with ORMs? Put another way, what problem are they trying to solve?

Make data layer more testable and refactorable by decoupling from a specific data storage. I would trade a horrendous large SP for a horrendous large C# codebase any day.

I'd choose "Door No. 3": neither C# nor stored procedures, but declarative db business rules using appropriate data types, constraints, default values, views, access controls, and triggers.

I agree that SQL is a brilliant data processing language, while C#, Java or C++ are terrible at it. And there are obvious benefits to learning SQL and being able to use it efficiently, makes imho much more sense than throwing away RDBMS because they're 'slow'. But if you write applications in an oo language objects are quite natural way of representing both data and logic. Sometimes you do application-level transaction management or use the mapping metadata to implement higher-level features, sometimes you have to integrate application events with data operations, or you need your data objects to live outside of database transaction - ORM saves you lots of coding in such situations. And adds services like caching, prefetch, intercept/triggers etc so you don't have to invent it.

I believe that C# is a better data processing language than SQL is, assuming it can access the data. That's mostly thanks to the strength of linq.

Linq is just a useful syntax, there's nothing about data processing there. It's only good for small data sets that fit entirely in memory.

I completely disagree. SQL is just syntax too. Syntax for a language that can process data, just like linq. And Linq can also operate on IQueryable<T>, which can represent data structures too large to fit in memory. It functions just fine on infinite data structures for that matter too.

I'd rather write Django ORM than write create and alter table SQL, and migration SQL any day of the week. The developers who wrote Django's ORM are also way better at writing SQL and database related code than I am and sure I could spend all the time I need to become so proficient that my migrations work as nicely as migrations and syncing in Django and Django related projects like south, or I could just spend that time writing my application using Django ORM. If and when problems happen, as they have, I usually find easy ways to solve them. I've been using Django for over 4 years in high load environments with few frustrations.

Before using Django I wrote my own SQL with web.py, and PHP before that. When I work on non-Python projects I use non-relational databases instead of SQL.

To be sincere, I find Django's ORM is one of the weakest ones (e.g., the API doesn't support a simple GROUP BY). If you want to make a good case for ORMs, Django's may not be a very strong argument.

In about 5 years working with Django the ORM has been the only component that consistently gave headaches. I have filled a couple bug reports about it generating non-sense/slow queries too (like generating queries with `DISTINCT` for no reason, with no way to override).

>the API doesn't support a simple GROUP BY

This is just plain wrong (see aggregate/annotate).

Let's word it differently: the API doesn't support a simple way to emit an arbitrary GROUP BY clause.

The aggregate/annotate APIs do not cover all legitimate uses of GROUP BY besides very simple cases (sum, avg, etc).

I wonder if one's stance on this issue moves in cycles as one gains experience. You start off with the ORM cause it's easy, then you make a mess and decide to be rigorous which includes coding all-SQL, then you realize how tedious that is and why ORMs came about in the first place.

Most people who complain about ORMs complain about the input - the querying aspect of them. And it is true, particularly for complex queries ORMs can be pretty hideous. If a query goes beyond a certain level of complexity I would much rather replace them with a series of views or stored procedures. ORMs are not good for complex queries.

For me, where ORMs do shine is with their output. If you have two tables, A and B with one to many relationships between them - with pure SQL running a join on these tables will return a single result set. Table As data will be duplicated for each row of B. With an ORM you can get back a single object A containing a collection of B's rows.

This is enough reason for me to reach for an ORM for anything but the simplest of problems.

Lazy loading generally also comes for free.

I think ORMs like SQLAlchemy are really useful for many many use cases. I don't think most people who work with ORMs deal with the kind of complexity described by the author, let alone work on such a specific application for 30 months at a time. In that sense, ORMs are super powerful tools that cut down your work, shortens your code and do nifty optimizations once in a while

With that being said, this article totally resounds with me, having resorted to using direct SQL in almost all of my large-ish projects. Using an ORM always starts out nicely, and then eventually it gets messier and messier. Specifically, when your DB data has subtle interactions with a cache, I have found it harder to keep ORM code clean and readable.

SQLAlchemy isn't like most ORMs. It's data model is actually closer to the relational model than OOP objects, and therefore lacks many of the ORM warts.

I wish more ORMs were like SQLAlchemy.

Dynamic typing is quite good for this sort of integration scenarios.

The one thing I like about ORMs (where I control the ORM): I can prevent any SQL query from being run that doesn't have a WHERE statement.

Especially when dealing with anything that changes data, lack of WHERE is very dangerous.

That said, I write raw SQL. It's the only way to get the performance. As an abstraction it's already great and I don't feel I need another abstraction on top of it.

An example of the kind of query that an ORM is bad at:

    SELECT id
          ,value_from_row_based_functions(some_column) AS foo
      FROM bar
     WHERE some_column = 'blah'
It's bad because the ORM will merrily do this even though the row based function may be expensive. Over hundreds of rows this isn't an issue, but over millions or tens of millions it kills your query (if it doesn't timeout it will merrily carry on for hours).

Good SQL always reduces the amount of work involved, you tailor it as you would your regular code to avoid doing work that isn't needed (reading rows you will discard through a filter, etc).

The above could be written as:

    SELECT id
          ,value_from_row_based_functions(some_column) AS foo
      FROM (
                SELECT id
                  FROM bar
                 WHERE some_column = 'blah'
                 ORDER BY id DESC
                 FETCH FIRST 25 ROWS ONLY
           ) AS something
And now the row based function is only applied to the 25 rows that are going to be returned.

If you look at the very bottom of this: http://www.postgresql.org/docs/9.2/static/textsearch-control... you'll see this is a real-world problem.

ORMs can stitch together your SQL in a certain way that is fairly optimal for most common scenarios... but you will encounter a scenario in which raw SQL is the only way to go.

I'm almost 100% certain that the function will be called only 25 times no matter which way you write the query. Not only is the select statement semantically at the end, but any engine worth anything optimizes this kind of things.

You are 100% incorrect when it comes to Postgres. Feel free to try it out.

I think, the problem is in all-or-nothing approach when discussing that topic. It's always "abstract pure SQL" vs "abstract pure ORM", but I think the solution is in reasonable compromise between them.

First, it's definitely bad idea to write ORM that works with absolutely any database schema. If you accept some limitations in schema design (and some in model layer too), you can really benefit in more simple ORM design.

For instance, you can eliminate partial load at all, breaking stored entities into separate aspects in separate tables and using some tricks for fetching associations (see below).

Second, ORM should be as lightweight as possible, so you can really use it and not fight it.

There are things that just should not be used in ORM, if we don't want to solve some complicated performance problems. I believe, it's really bad idea to use auto-generated SQL JOINs on ORM side, it's very hard to control and optimize them. There is simple and clever solution by Jakub Vrana (https://www.facebook.com/jakubvrana/posts/415359675151430), that works really good if implemented carefully on ORM side. When you really need JOIN, you can use VIEW on database schema side or (better) use some kind of denormalization.

You can optimize VIEW on SQL side, you can move from views to denormalization without breaking model layer etc.

Using stored code on SQL side is good, you can maintain your database without any special client tools or code, and you could automate a lot of denormalization using triggers.

We have some slides (in Russian, sorry) on this topic here http://www.slideshare.net/interlabs-ru/model-patterns and here http://www.slideshare.net/interlabs-ru/sustainable-schema and implemented lightweight proprietary ORM (in PHP) based on principles above, possible open it later.

ORMs can definitely help. When you have a large team of developers, not everyone does things the same way, or even the right way.

Also, what if you want to change your database structure? Wouldn't an ORM help in that instance as well?

If I have to code a serious application with complex schema, ORM is out of the discussion. If I want to kick fast prototype project I would consider it. But in the end I know that I will regret it!!!!

Great article!

This is fascinating for me, because I've recently had the opportunity to watch someone from a pure maths background learn SQL from scratch. It seems like a very different experience when you arrive on day one appreciating the underlying theory of relational databases, as opposed to my experience from a more enterprisey background. I didn't realise what a mental block the subset of SQL available to Hibernate had imposed until recently, and I suspect things like ActiveRecord hide even more of the good stuff away.

The first ORM I used was basically a bunch of C++ macros for Oracle, about 16 years ago. I've been using both SQL and ORMs ever since in various forms. I'm not sure how anyone could ever use an ORM effectively without knowing SQL.

Some people only know how to administer their box through X11 apps, but others know the command line. It feels like that when working with devs that do not know SQL very well. They'd be lost without their ORM.

Disclaimer - I have little know how of ORMs and haven't developed anything more than scripts for a long long time.

What happened to 4GLs when the web came? i.e. It looks a little like this would be a solved problem if there were a open source web 4GL language that integrated SQL into say, python as deeply as SQL is intergrated into Oracle PL/SQL or Ingres OpenROAD whilst providing a decent web app framework to go along with it.

ORM like anything aren't all in the same league. I didn't like them blindly, but SQLAlchemy works very well (because its layered architecture).

I haven't seen anyone say this yet but it seems that this guy used the wrong tool for the job he was doing. He needed to get off of the ORM long before his attribute count got up to 600. ORM is not the right tool to use for his problem. He was trying to fit the glass slipper onto the giant ugly step sister and then wrote an article about how all glass slippers are garbage.

People should try out InterMine. It's an ORM (and other tools) that I wrote a while back, and runs the backbone of a few genetics research databases. Funnily enough, the object-oriented query language shared almost all its syntax with SQL. It also does a load of things to avoid some of the problems in this article.

If it ain't broke, don't fix it.

Can anyone tell me which of the OP's arguments against ORMs isn't already addressed in SQLAlchemy? I can't find any, but maybe I've overlooked something.

Anyone who is already coding in Python and hasn't yet discovered SQLAlchemy would find that doing so is a great investment in the little time that is available.

Ever since we learned about Relational Algebra, I have felt, that as an approach it is superior to both SQL and ORM.

Learning ORM without learningn sql is great for beginnners. Ex: Django Framework for new comers. But like any abstracction, learning SQL will allow you to optimize w raw sql as needed. Its like trying to learn coffeescript without learning JavaScript. It always helps to learn from bottom up..

I disagree. Some of the worst horrors Ive seen over my carrer is systems written by developers using an ORM "So they don't have to learn SQL".

I find some folks has these really strong beliefs:

1. SQL is for DBAs, Im a developer! 2. SQL is legacy and if you use it directly you must be some kind of luddite. 3. I tried to write some SQL once by copy/pasting someone else's, and I got really strange results back. Nevermind that I dint try to actually understand the query.

I always let the database do heavy lifting. I write 100 line SQL that if you turned into code, will turn into 1000 line of code. It scares most folks but whatever, it runs much faster and once it has been fined tuned, runs fine without needing much. I just comment the hell out of it.

I find ORMs quite handy, and they've always generated tight SQL.

Oh yeah, we had to support an additional DB recently... all that involved was changing the JDBC connection information and dropping in a new driver. Good luck doing that when everything is written in raw SQL.

I usually just restrict the SQL I'm using to something reasonable and I've never had any trouble...

What ORMs have taught me: people will complain endlessly about how ORMs don't work, when they don't bother to understand how they work, what they should/shouldn't be used for, and how to use them appropriately.

I have a laravel project for which all of my models are raw SQL statements for this reason. At the very least, it makes the code more portable, and it makes it easier to reason about the statements when you can actually see them.

It makes your code less portable. Raw SQL means you're essentially tied to a single DB provider. What happens when you need to move to a different one?

Then you should probably rewrite your statements anyways to take advantage of platform-specific features. I won't write a query on MySQL the same as I would on Postgres, because there's usually a better way.

CRUD is about all that can honestly stay the same, as IMO that's where ORMs shine.

Sorry... I meant more portable between applications (assuming you're willing to stick with SQL), obviously not between database providers.

What if you don't need to move to a different one?

> At the very least, it makes the code more portable

How writing SQL make your code more portable ? using an abstraction make something portable.SQL implementations are differents from one table to another.

Using an ORM ties you to that implementation. I can more easily take a raw SQL statement and use it elsewhere.

"(And hey, with the advent of "devops", the divide between the developer and the database administrator is basically non-existent)"

Not really. Devops is a new role. Not everyone would like to work in devops.

ORMs are a breath of fresh air for developers IMHO. I can concentrate in DB design, use SQLite for dev and deploy PSQL on the server (production). Awesome level of abstraction :-)

After looking at kind of monstrosity like Entity Framework, I've given up on traditional ORMs. My current choice for db access is micro-ORMs like PetaPOCO.

Check the EF7 - they started from scratch:


Doesn't look that promising: https://github.com/aspnet/EntityFramework/wiki/Using-EF7-in-...

Look at the simplicity and beauty of micro-ORM like PetaPOCO: http://www.toptensoftware.com/petapoco/

The fact is you don't want to put in large teams in creating ORMs because that would invariably mean feature creep and lot of black magic happening behind the scene. ORMs should be super light weight, a single page doc should be enough to explain everything with it and there should be no "magic".

Does anyone have a decent version control system for stored procedures? How do I keep them in my git repository alongside my code? How do I maintain them?

If you use MS SQL Server, Red Gate has a solution for that [0]. I don't know if there is something similar for other RDBMSs.

[0]: http://www.red-gate.com/products/sql-development/sql-source-...

I find the species of SQL in any sufficiently complex application are too varied for a blanket rule. Add performance demands, and some ORMs are right out.

A small nit on the opening:

> ... ORMs are more detriment than benefit. In short, they can be used to nicely augment working with SQL in a program ...

That statements contradictory.

Pretty sure you are missing one of the biggest rules of ORMs. This rule being you don't use them for reporting software or reporting queries.

I've generally found the following mantra to hold true:

ORMs for a singe record (and associations) Raw SQL for groups

SQL is not hard and certainly no harder then something like the entity framework

how about writing a reusable database wrapper class?

you can generalize queries : query(sql, args[]):

and for specific purposes

addName("name"): query('insert into table ..., "name") .. other functions(..) : query('..','')

See also from a few years ago: http://seldo.com/weblog/2011/08/11/orm_is_an_antipattern

I agree with basically everything in this post, obviously :-)

There seems to be an underlying assumption that SQL performance is crucial. Of course an ORM is going to produce less efficient SQL, that's the tradeoff. However, object caching may eliminate this concern.

For example, I've worked with a system where each field is a table and data retrieval can require huge inefficient joins just as you'd expect. Due to general revulsion at the idea when it was introduced, there was an initiative to create materialized views. But this completely collapsed when it was discovered that the existing object caching benchmarked just as well whilst removing system complexity. Also worth thinking beyond selects - another very big advantage of table per field is in altering schema on large datasets, which was a major undertaking on more read-efficient schema. I think this is an example of worse being better.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact